Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Modelling of Extreme Values
Version:2.1
Description:Various tools for the analysis of univariate, multivariate and functional extremes. Exact simulation from max-stable processes (Dombry, Engelke and Oesting, 2016, <doi:10.1093/biomet/asw008>, R-Pareto processes for various parametric models, including Brown-Resnick (Wadsworth and Tawn, 2014, <doi:10.1093/biomet/ast042>) and Extremal Student (Thibaud and Opitz, 2015, <doi:10.1093/biomet/asv045>). Threshold selection methods, including Wadsworth (2016) <doi:10.1080/00401706.2014.998345>, and Northrop and Coleman (2014) <doi:10.1007/s10687-014-0183-z>. Multivariate extreme diagnostics. Estimation and likelihoods for univariate extremes, e.g., Coles (2001) <doi:10.1007/978-1-4471-3675-0>.
License:GPL-3
URL:https://lbelzile.github.io/mev/,https://github.com/lbelzile/mev/
BugReports:https://github.com/lbelzile/mev/issues/
Depends:R (≥ 3.4)
Imports:alabama, methods, nleqslv, numDeriv, Rcpp (≥ 0.12.16),Rsolnp, stats,
Suggests:boot, cobs, evd, expint, knitr, MASS, mvPot (≥ 0.1.4),mvtnorm, gmm, revdbayes, rmarkdown, ismev, tinytest,TruncatedNormal (≥ 1.1)
LinkingTo:Rcpp, RcppArmadillo
LazyData:true
RoxygenNote:7.3.3
VignetteBuilder:knitr
Encoding:UTF-8
ByteCompile:true
NeedsCompilation:yes
Packaged:2025-11-10 18:08:19 UTC; lbelzile
Author:Leo BelzileORCID iD [aut, cre], Jennifer L. Wadsworth [aut], Paul J. NorthropORCID iD [aut], Raphael HuserORCID iD [aut], Scott D. GrimshawORCID iD [aut], Jin Zhang [ctb], Michael A. Stephens [ctb], Art B. Owen [ctb]
Maintainer:Leo Belzile <belzilel@gmail.com>
Repository:CRAN
Date/Publication:2025-11-11 06:10:27 UTC

Joint maximum likelihood estimation for exponential model

Description

Calculates the MLEs of the rate parameter, and joint asymptotic covariance matrix of these MLEsover a range of thresholds as supplied by the user.

Usage

.Joint_MLE_Expl(x, u = NULL, k, q1, q2 = 1, param)

Arguments

x

vector of data

u

vector of thresholds. If not supplied, thenkthresholds between quantiles (q1,q2) will be used

k

number of thresholds to consider if u not supplied

q1

lower quantile to consider for threshold

q2

upper quantile to consider for threshold

param

character specifying'InvRate' or'Rate'for either inverse rate parameter / rate parameter, respectively

Value

a list with

Author(s)

Jennifer L. Wadsworth


GP fitting function of Grimshaw (1993)

Description

Function for estimating parametersk anda for a random sample from a GPD.

Usage

.fit.gpd.grimshaw(x)

Arguments

x

sample values

Value

a list with the maximum likelihood estimates of componentsa andk

Author(s)

Scott D. Grimshaw


Robust threshold selection of Dupuis

Description

The optimal bias-robust estimator (OBRE) for the generalized Pareto.This function returns robust estimates and the associated weights.

Usage

.fit.gpd.rob(dat, thresh, k = 4, tol = 1e-05, show = FALSE)

Arguments

dat

a numeric vector of data

thresh

threshold parameter

k

bound on the influence function; the constantk is a robustness parameter(higher bounds are more efficient, low bounds are more robust). Default to 4.

tol

numerical tolerance for OBRE weights iterations.

show

logical: should diagnostics and estimates be printed. Default toFALSE.

Value

a list with the same components asfit.gpd,in addition to

References

Dupuis, D.J. (1998). Exceedances over High Thresholds: A Guide to Threshold Selection,Extremes,1(3), 251–261.

See Also

fit.gpd

Examples

dat <- rexp(100).fit.gpd.rob(dat, 0.1)

Posterior predictive distribution and density for the GEV distribution

Description

This function calculates the posterior predictive density at points xbased on a matrix of posterior density parameters

Usage

.gev.postpred(x, posterior, Nyr = 100, type = c("density", "quantile"))

Arguments

x

n vector of points

posterior

n by3 matrix of posterior samples

Nyr

number of years to extrapolate

type

string indicating whether to return the posteriordensity or thequantile.

Value

a vector of values for the posterior predictive density or quantile atx, depending on the value oftype


Maximum likelihood method for the generalized Pareto Model

Description

Maximum-likelihood estimation for the generalized Pareto model, including generalized linear modelling of each parameter. This function was adapted by Paul Northrop to include the gradient in thegpd.fit routine fromismev.

Usage

.gpd_2D_fit(  xdat,  threshold,  npy = 365,  ydat = NULL,  sigl = NULL,  shl = NULL,  siglink = identity,  shlink = identity,  siginit = NULL,  shinit = NULL,  show = TRUE,  method = "Nelder-Mead",  maxit = 10000,  ...)

Arguments

xdat

numeric vector of data to be fitted.

threshold

a scalar or a numeric vector of the same length asxdat.

npy

number of observations per year/block.

ydat

matrix of covariates for generalized linear modelling of the parameters (orNULL (the default) for stationary fitting). The number of rows should be the same as the length ofxdat.

sigl

numeric vector of integers, giving the columns ofydat that contain covariates for generalized linear modelling of the scale parameter (orNULL (the default) if the corresponding parameter is stationary).

shl

numeric vector of integers, giving the columns ofydat that contain covariates for generalized linear modelling of the shape parameter (orNULL (the default) if the corresponding parameter is stationary).

siglink

inverse link functions for generalized linear modelling of the scale parameter

shlink

inverse link functions for generalized linear modelling of the shape parameter

siginit

numeric giving initial value(s) for parameter estimates. IfNULL the default issqrt(6 * var(xdat))/pi

shinit

numeric giving initial value(s) for the shape parameter estimate; ifNULL, this is 0.1. If using parameter covariates, then these values are used for the constant term, and zeros for all other terms.

show

logical; ifTRUE (default), print details of the fit.

method

optimization method (seeoptim for details).

maxit

maximum number of iterations.

...

other control parameters for the optimization. These are passed to components of thecontrol argument ofoptim.

Details

For non-stationary fitting it is recommended that the covariates within the generalized linear models are (at least approximately) centered and scaled (i.e. the columns ofydat should be approximately centered and scaled).

The form of the GP model used follows Coles (2001) Eq (4.7). In particular, the shape parameter is defined so that positive values imply a heavy tail and negative values imply a bounded upper value.

Value

a list with components

nexc

scalar giving the number of threshold exceedances.

nllh

scalar giving the negative log-likelihood value.

mle

numeric vector giving the MLE's for the scale and shape parameters, resp.

rate

scalar giving the estimated probability of exceeding the threshold.

se

numeric vector giving the standard error estimates for the scale and shape parameter estimates, resp.

trans

logical indicator for a non-stationary fit.

model

list with componentssigl andshl.

link

character vector giving inverse link functions.

threshold

threshold, or vector of thresholds.

nexc

number of data points above the threshold.

data

data that lie above the threshold. For non-stationary models, the data are standardized.

conv

convergence code, taken from the list returned byoptim. A zero indicates successful convergence.

nllh

negative log likelihood evaluated at the maximum likelihood estimates.

vals

matrix with three columns containing the maximum likelihood estimates of the scale and shape parameters, and the threshold, at each data point.

mle

vector containing the maximum likelihood estimates.

rate

proportion of data points that lie above the threshold.

cov

covariance matrix.

se

numeric vector containing the standard errors.

n

number of data points (i.e., the length ofxdat).

npy

number of observations per year/block.

xdata

data that has been fitted.

References

Coles, S., 2001. An Introduction to Statistical Modeling of Extreme Values. Springer-Verlag, London.


Is the matrix conditionally negative semi-definite?Function adapted from 'is.CNSD' in the CEGO package, v 2.1.0

Description

Is the matrix conditionally negative semi-definite?Function adapted from 'is.CNSD' in the CEGO package, v 2.1.0

Usage

.is.CNSD(X, tol = 1e-08)

Arguments

X

a symmetric matrix

tol

tolerance value; eigenvalues between-tol andtol are assumed to be zero.

Author(s)

Martin Zaefferer


Internal function

Description

Takes a list of asymmetry parameters with an associated dependence vector and returnsthe corresponding asymmetry matrix for the asymmetric logistic and asymmetric negative logistic models

Usage

.mvasym.check(asy, dep, d, model = c("alog", "aneglog", "maxlin"))

Arguments

asy

a list of2^d-1 asymmetry components, as in Stephenson bvevd functions

dep

vector of2^d-d-1 values for the dependence parameter

d

dimension of the model

model

eitheralog for the asymmetric logistic oraneglogfor the asymmetric negative logistic

Details

This function is extracted from the evd package and modified(C) Alec Stephenson

Value

a matrix of asymmetry components, enumerating all possible2^d-1 subsets ofthe power set


Generate from Brown-Resnick processY \sim {P_x}, whereP_{x} is probability of extremal function

Description

Generate from Brown-Resnick processY \sim {P_x}, whereP_{x} is probability of extremal function

Usage

.rPBrownResnick(index, Sigma_chol, Sigma)

Arguments

index

index of the location. An integer in 0, ...,d-1

Sigma

a positive definite covariance matrix

Value

ad-vector fromP_x


Generate from extremal Husler-Reiss distributionY \sim {P_x}, whereP_{x} is probability of extremal function

Description

Generate from extremal Husler-Reiss distributionY \sim {P_x}, whereP_{x} is probability of extremal function

Usage

.rPHuslerReiss(index, cholesky, Sigma)

Arguments

index

index of the location. An integer in 0, ...,d-1

cholesky

the Cholesky root ofSigma

Sigma

a covariance matrix formed from the symmetric square matrix of coefficients\lambda^2

Value

ad-vector fromP_x


Generate from Smith model (moving maxima)Y \sim {P_x}, whereP_{x} is probability of extremal function

Description

Generate from Smith model (moving maxima)Y \sim {P_x}, whereP_{x} is probability of extremal function

Usage

.rPSmith(index, Sigma_chol, loc)

Arguments

index

index of the location. An integer in 0, ...,d-1

Sigma_chol

the Cholesky root of the covariance matrix

loc

location matrix

Value

ad-vector fromP_x


Generate from bilogisticY \sim {P_x}, whereP_{x} is the probability of extremal functions

Description

Generate from bilogisticY \sim {P_x}, whereP_{x} is the probability of extremal functions

Usage

.rPbilog(d, index, alpha)

Arguments

d

dimension of the 1-sample

index

index of the location. An integer in 0, ...,d-1

alpha

ad dimensional vector of positive parameter values for the Dirichlet vector

Value

ad-vector fromP_x


Generate from extremal DirichletY \sim {P_x}, whereP_{x} is the probability of extremal functions from the Dirichlet model ofColes and Tawn.

Description

Note: we generate from the Dirichlet rather than the Gamma distribution, since the former is parallelized

Usage

.rPdir(d, index, alpha, irv = FALSE)

Arguments

d

dimension of the 1-sample

index

index of the location. An integer in 0, ...,d-1

alpha

ad dimensional vector of positive parameter values for the Dirichlet vector, ord+1 if the last entry is the index of regular variation of the model, a constant in(0, 1]

irv

should the usual model (FALSE) or the general scaled version (TRUE) be used

Value

ad-vector fromP_x


Generate from extremal DirichletY \sim {P_x}, whereP_{x} is the probability of extremal functions from a Dirichlet mixture

Description

Generate from extremal DirichletY \sim {P_x}, whereP_{x} is the probability of extremal functions from a Dirichlet mixture

Usage

.rPdirmix(d, index, alpha, weight)

Arguments

d

dimension of the 1-sample

index

index of the location. An integer in 0, ...,d-1

alpha

ad \times n dimensional vector of positive parameter values for the Dirichlet vector

weight

am vector of mixture weights, which sum to 1

Value

ad-vector fromP_x


Generate from extremal Student-tY \sim {P_x}, whereP_{x} is probability of extremal function

Description

Generate from extremal Student-tY \sim {P_x}, whereP_{x} is probability of extremal function

Usage

.rPexstud(index, cholesky, sigma, al)

Arguments

index

index of the location. An integer in 0, ...,d-1

cholesky

Cholesky root of transformed correlation matrix

sigma

a positive semi-definite correlation matrix

al

the alpha parameter in Proposition 7. Corresponds to degrees of freedom - 1

Value

ad-vector fromP_x


Generate from logisticY \sim {P_x}, whereP_{x} is probability of extremal function scaled by a Frechet variate

Description

Generate from logisticY \sim {P_x}, whereP_{x} is probability of extremal function scaled by a Frechet variate

Usage

.rPlog(d, index, theta)

Arguments

d

dimension of the 1-sample

index

index of the location. An integer in 0, ...,d-1

theta

a one-dimensional parameter for the logistic model, strictly greater than 1.

Value

ad-vector fromP_x


Generate from negative logisticY \sim {P_x}, whereP_{x} is probability of extremal function scaled by a Frechet variate

Description

Generate from negative logisticY \sim {P_x}, whereP_{x} is probability of extremal function scaled by a Frechet variate

Usage

.rPneglog(d, index, theta)

Arguments

d

dimension of the 1-sample

index

index of the location. An integer in 0, ...,d-1

theta

a one-dimensional parameter for the negative logistic model

Value

ad-vector fromP_x


Samples from exceedances at site (scaled extremal function definition)

Description

Samples from exceedances at site (scaled extremal function definition)

Usage

.rPsite(n, j, d, par, model, Sigma, loc)

Arguments

n

sample size

j

index of the site or variable

d

dimension of the multivariate distribution

par

a vector of parameters

model

integer, currently ranging from 1 to 9, corresponding respectively to(1)log, (2)neglog, (3)dirmix, (4)bilog,(5)extstud, (6)br, (7)ct andsdir, (8)smith and (9)hr.

Sigma

covariance matrix for Brown-Resnick, Smith and extremal student. Default for compatibility

loc

matrix of location for Smith model.

Value

an byd matrix containing the sample


Generates fromQ_i, the spectral measure of the bilogistic model

Description

Simulation algorithm of Boldi (2009) for the bilogistic model

Usage

.rbilogspec(n, alpha)

Arguments

n

sample size

alpha

vector of parameter of dimensiond

Value

ann byd sample from the spectral distribution

References

Boldi (2009). A note on the representation of parametric modelsfor multivariate extremes.Extremes12, 211–218.


Generates fromQ_i, the spectral measure of the Brown-Resnick model

Description

Simulation algorithm of Dombry et al. (2015)

Usage

.rbrspec(n, Sigma_chol, Sigma)

Arguments

n

sample size

Sigma_chol

Cholesky root ofSigma

Sigma

d-dimensional covariance matrix

Value

ann byd sample from the spectral distribution

References

Dombry, Engelke and Oesting (2016). Exact simulation of max-stable processes,Biometrika,103(2), 303–317.


Generates fromQ_i, the spectral measure of the Dirichlet mixture model

Description

Simulation algorithm of Dombry et al. (2015)

Usage

.rdirmixspec(n, d, alpha, weight)

Arguments

n

sample size

d

dimension of the 1-sample

alpha

ad \times n dimensional vector of positive parameter values for the Dirichlet vector

weight

am vector of mixture weights, which sum to 1

Value

ann byd sample from the spectral distribution


Generates fromQ_i, the spectral measure of the extremal Dirichletmodel

Description

This model was introduced in Coles and Tawn (1991); thepresent method uses the simulation algorithm of Boldi (2009) for the extremal Dirichlet model

Usage

.rdirspec(n, d, alpha, irv = FALSE)

Arguments

n

sample size

d

dimension of sample

alpha

vector of Dirichlet parameters of dimensiond, ord+1 vector with thed Dirichlet parameters and an index of regular variation in[0, 1]

irv

should the usual model (FALSE) or the general scaled version (TRUE) be used

Value

ann byd sample from the spectral distribution

References

Boldi (2009). A note on the representation of parametric modelsfor multivariate extremes.Extremes12, 211–218.


Generates fromQ_i, the spectral measure of the extremal Student model

Description

Generates fromQ_i, the spectral measure of the extremal Student model

Usage

.rexstudspec(n, sigma, al)

Arguments

sigma

a positive semi-definite covariance matrix with unit variance

al

the alpha parameter in Proposition 7. Corresponds to degrees of freedom - 1

Value

ann byd sample from the spectral distribution


Generates fromQ_i, the spectral measure of the Husler-Reiss model

Description

Generates fromQ_i, the spectral measure of the Husler-Reiss model

Usage

.rhrspec(n, Lambda)

Arguments

Lambda

an symmetric square matrix of coefficients\lambda^2

Value

ann byd sample from the spectral distribution


Generates fromQ_i, the spectral measure of the logistic model

Description

Simulation algorithm of Dombry et al. (2015)

Usage

.rlogspec(n, d, theta)

Arguments

n

sample size

theta

a one-dimensional parameter

Value

ann byd sample from the spectral distribution

References

Dombry, Engelke and Oesting (2016). Exact simulation of max-stable processes,Biometrika,103(2), 303–317.


Multivariate extreme value distribution sampling algorithm via angular measure

Description

This algorithm corresponds to Algorithm 1 in Dombry, Engelke and Oesting (2016),using the formulation of the Dirichlet mixture of Coles and Tawn (1991)as described and derived in Boldi (2009) for the bilogistic and extremalDirichlet model. Models currently implemented include logistic, negativelogistic, extremal Dirichlet and bilogistic MEV.

Usage

.rmevA1(n, d, par, model, Sigma, loc)

Arguments

n

sample size

d

dimension of the multivariate distribution

par

a vector of parameters

model

integer, currently ranging from 1 to 9, corresponding respectively to(1)log, (2)neglog, (3)dirmix, (4)bilog,(5)extstud, (6)br, (7)ct andsdir, (8)smith and (9)hr.

Sigma

covariance matrix for Brown-Resnick, Smith and extremal student. Conditionally negative definitematrix of parameters for the Huesler–Reiss model. Default matrix for compatibility

loc

matrix of location for Smith model.

Value

an byd matrix containing the sample


Multivariate extreme value distribution sampling algorithm via extremal functions

Description

Code implementing Algorithm 2 in Dombry, Engelke and Oesting (2016)

Usage

.rmevA2(n, d, par, model, Sigma, loc)

Arguments

n

sample size

d

dimension of the multivariate distribution

par

a vector of parameters

model

integer, currently ranging from 1 to 9, corresponding respectively to(1)log, (2)neglog, (3)dirmix, (4)bilog,(5)extstud, (6)br, (7)ct andsdir, (8)smith and (9)hr.

Sigma

covariance matrix for Brown-Resnick, Smith and extremal student. Default for compatibility

loc

matrix of location for Smith model.

Value

an byd matrix containing the sample


Random samples from asymmetric logistic distribution

Description

Simulation algorithm of Stephenson (2003), using exact-samples from the logistic

Usage

.rmevasy(n, d, par, asym, ncompo, Sigma, model)

Arguments

n

sample size

d

dimension of the multivariate distribution

par

a vector of parameters

asym

matrix of bool indicating which component belong to the corresponding row logistic model

ncompo

number of components for the (negative) logistic in row

Sigma

matrix of asymmetry parameters

Value

an byd matrix containing the sample

References

Stephenson, A. G. (2003) Simulating multivariate extreme value distributions of logistic type.Extremes,6(1), 49–60.

Joe, H. (1990). Families of min-stable multivariate exponential and multivariateextreme value distributions,9, 75–81.


Random sampling from spectral distribution on l1 sphere

Description

Generate fromQ_i, the spectral measure of a given multivariate extreme value model

Usage

.rmevspec_cpp(n, d, par, model, Sigma, loc)

Arguments

n

sample size

d

dimension of the multivariate distribution

par

a vector of parameters

model

integer, currently ranging from 1 to 9, corresponding respectively to(1)log, (2)neglog, (3)dirmix, (4)bilog,(5)extstud, (6)br, (7)ct andsdir, (8)smith and (9)hr.

Sigma

covariance matrix for Brown-Resnick and extremal student, symmetric matrixof squared coefficients\lambda^2 for Husler-Reiss. Default for compatibility

loc

matrix of locations for the Smith model

Value

an byd matrix containing the sample

References

Dombry, Engelke and Oesting (2016). Exact simulation of max-stable processes,Biometrika,103(2), 303–317.

Boldi (2009). A note on the representation of parametric models for multivariate extremes.Extremes12, 211–218.


Multivariate Normal distribution sampler (Rcpp version), derived using the eigendecompositionof the covariance matrix Sigma. The function utilizes the arma random normal generator

Description

Multivariate Normal distribution sampler (Rcpp version), derived using the eigendecompositionof the covariance matrix Sigma. The function utilizes the arma random normal generator

Usage

.rmnorm_arma(n, Mu, Xmat, eigen = TRUE)

Arguments

n

sample size

Mu

mean vector. Will set the dimension

Xmat

covariance matrix, of same dimension asMu (and square matrix).No sanity check is performed to validate that the matrix is symmetric, so use at own risk

Value

ann sample from a multivariate Normal distribution


Generates fromQ_i, the spectral measure of the negative logistic model

Description

Simulation algorithm of Dombry et al. (2016)

Usage

.rneglogspec(n, d, theta)

Arguments

n

sample size

theta

a one-dimensional parameter

Value

ann byd sample from the spectral distribution

References

Dombry, Engelke and Oesting (2016). Exact simulation of max-stable processes,Biometrika,103(2), 303–317.


Generates fromQ_i, the spectral measure of the pairwise Beta model

Description

This model was introduced in Cooley, Davis and Naveau (2010).The sample is drawn from a mixture and the algorithm follows from the proof of Theorem 1 in Ballani and Schlather (2011)and is written in full in Algorithm 1 of Sabourin et al. (2013).

Usage

.rpairbetaspec(n, d, alpha, beta)

Arguments

n

sample size

alpha

concentration parameter

beta

vector of all pairwise component (lexicographic order, by row)

Value

a matrix of samples from the angular distribution

ann byd sample from the spectral distribution

Author(s)

Leo Belzile

References

Cooley, D., R.A. Davis and P. Naveau (2010). The pairwise beta distribution: A flexible parametric multivariate model for extremes,Journal of Multivariate Analysis,101(9), 2103–2117.

Ballani, D. and M. Schlather (2011). A construction principle for multivariate extreme value distributions,Biometrika,98(3), 633–645.

Sabourin, A., P. Naveau and A. Fougeres (2013). Bayesian model averaging for extremes,Extremes,16, 325–350.


Generates fromQ_i, the spectral measure of the pairwise exponential model

Description

The sample is drawn from a mixture and the algorithm follows from the proof of Theorem 1 in Ballani and Schlather (2011).

Usage

.rpairexpspec(n, d, alpha, beta)

Arguments

n

sample size

alpha

concentration parameter

beta

vector of all pairwise component (lexicographic order, by row)

Value

a matrix of samples from the angular distribution

ann byd sample from the spectral distribution

Author(s)

Leo Belzile

References

Ballani, D. and M. Schlather (2011). A construction principle for multivariate extreme value distributions,Biometrika,98(3), 633–645.


Generates fromQ_i, the spectral measure of the Smith model (moving maxima)

Description

Simulation algorithm of Dombry et al. (2015)

Usage

.rsmithspec(n, Sigma_chol, loc)

Arguments

n

sample size

Sigma_chol

Cholesky decomposition of thed-dimensional covariance matrix (upper triangular)

loc

location matrix

Value

ann byd sample from the spectral distribution

References

Dombry, Engelke and Oesting (2016). Exact simulation of max-stable processes,Biometrika,103(2), 303–317.


Generates fromQ_i, the spectral measure of the weighted Dirichlet model

Description

This model was introduced in Ballani and Schlather (2011).

Usage

.rwdirbsspec(n, d, alpha, beta)

Arguments

n

sample size

alpha

vector of concentration parameters

beta

vector of Dirichlet components

Value

a matrix of samples from the angular distribution

ann byd sample from the spectral distribution

Author(s)

Leo Belzile

References

Ballani, D. and M. Schlather (2011). A construction principle for multivariate extreme value distributions,Biometrika,98(3), 633–645.


Generates fromQ_i, the spectral measure of the weighted exponential model

Description

This model was introduced in Ballani and Schlather (2011).

Usage

.rwexpbsspec(n, d, alpha, beta)

Arguments

n

sample size

alpha

vector of concentration parameters

beta

vector of Dirichlet components

Value

a matrix of samples from the angular distribution

ann byd sample from the spectral distribution

Author(s)

Leo Belzile

References

Ballani, D. and M. Schlather (2011). A construction principle for multivariate extreme value distributions,Biometrika,98(3), 633–645.


Weighted empirical distribution function

Description

Compute an empirical distribution function with weightsw at each ofx

Usage

.wecdf(x, w)

Arguments

x

observations

w

weights

Value

a function of classecdf


Transform variogram matrix to covariance of conditional random field

Description

The matrixLambda is half the semivariogram matrix. The functionreturns the conditional covariance with respect to entries inco,restricted to thesubA rows and thesubB columns.

Usage

Lambda2cov(Lambda, co, subA, subB)

Arguments

Lambda

Negative definite matrix for the Huesler–Reiss model

co

vector of integer with conditioning sites

subA

vector of integers with sub-entries (not inco) for rows

subB

vector of integers with sub-entries (not inco) for columns. If missing, default tosubA.

Value

a covariance matrix


Score and likelihood ratio tests fit of equality of shape over multiple thresholds

Description

The function returns a P-value path for the score testand/or likelihood ratiotest for equality of the shape parameters overmultiple thresholds under the generalized Pareto model.

Usage

NC.diag(  xdat,  u,  GP.fit = c("Grimshaw", "nlm", "optim", "ismev"),  do.LRT = FALSE,  size = NULL,  plot = TRUE,  ...,  xi.tol = 0.001)

Arguments

xdat

numeric vector of raw data

u

m-vector of thresholds (sorted from smallest to largest)

GP.fit

function used to optimize the generalized Pareto model.

do.LRT

boolean indicating whether to perform the likelihood ratio test (in addition to the score test)

size

level at which a horizontal line is drawn on multiple threshold plot

plot

logical; ifTRUE, return a plot of p-values against threshold.

...

additional parameters passed toplot

xi.tol

numerical tolerance for threshold distance; if the absolute value ofxi1.hat is less thanxi.tol use linear interpolationto evaluate score vectors, expected Fisher information matrices, Hessians

Details

The default method is'Grimshaw' using the reduction of the parameters to a one-dimensionalmaximization. Other options are one-dimensional maximization of the profile thenlm function oroptim.Two-dimensional optimisation using 2D-optimizationismev using the routinefromgpd.fit from theismev library, with the addition of the algebraic gradient.The choice ofGP.fit should make no difference but the options were kept.Warning: the function will not recover from failure of the maximization routine, returning various error messages.

Value

a plot of P-values for the test at the different thresholdsu

Author(s)

Paul J. Northrop and Claire L. Coleman

References

Grimshaw (1993). Computing Maximum Likelihood Estimates for the GeneralizedPareto Distribution,Technometrics,35(2), 185–191.

Northrop & Coleman (2014). Improved threshold diagnostic plots for extreme valueanalyses,Extremes,17(2), 289–303.

Wadsworth & Tawn (2012). Likelihood-based procedures for thresholddiagnostics and uncertainty in extreme value modelling,J. R. Statist. Soc. B,74(3), 543–567.

Examples

## Not run: data(nidd)u <- seq(65,90, by = 1L)NC.diag(nidd, u, size = 0.05)## End(Not run)

Extreme U-statistic Pickands estimator

Description

[Deprecated function]Given a random sample ofn exceedances, the estimatorreturns an estimator of the shape parameter or extremevalue index using a kernel of order 3, based onk largest exceedances ofxdat. Oorschot et al. (2023) parametrize the model in terms ofm=n-k, so thatm=n corresponds to using only the three largest observations.

Usage

PickandsXU(xdat, m)

Arguments

xdat

vector of observations of lengthn

m

number of largest order statistics3 \leq k \leq n.

Details

The calculations are based on the recursions provided in Lemma 4.3 of Oorschot et al.

References

Oorschot, J, J. Segers and C. Zhou (2023), Tail inference using extreme U-statistics,Electronic Journal of Statistics, 17(1): 1113-1159.doi:10.1214/23-EJS2129


Stein's vector of weights

Description

Computes a vector of decreasing weights for exceedances.

Usage

Stein_weights(n, gamma = 1)

Arguments

n

integer, sample size

gamma

positive scalar for power

Value

a vector of positive weights of lengthn

References

Stein, M.L. (2023). A weighted composite log-likelihood approach to parametric estimation of the extreme quantiles of a distribution.Extremes26, 469-507 <doi:10.1007/s10687-023-00466-w>


Wadsworth's univariate and bivariate exponential threshold diagnostics

Description

Function to produce diagnostic plots and test statistics for thethreshold diagnostics exploiting structure of maximum likelihood estimatorsbased on the non-homogeneous Poisson process likelihood

Usage

W.diag(  xdat,  model = c("nhpp", "exp", "invexp"),  u = NULL,  k,  q1 = 0,  q2 = 1,  par = NULL,  M = NULL,  nbs = 1000,  alpha = 0.05,  plots = c("LRT", "WN", "PS"),  UseQuantiles = FALSE,  changepar = TRUE,  transform = TRUE,  ...)

Arguments

xdat

a numeric vector of data to be fitted.

model

string specifying whether the univariate or bivariate diagnostic should be used. Eithernhppfor the univariate model,exp (invexp) for the bivariate exponential model with rate (inverse rate) parametrization. See details.

u

optional; vector of candidate thresholds.

k

number of thresholds to consider (ifu unspecified).

q1

lowest quantile for the threshold sequence.

q2

upper quantile limit for the threshold sequence (q2 itself is not used as a threshold,but rather the uppermost threshold will be at the(q_2-1/k): q2-1/k quantile).

par

parameters of the NHPP likelihood. Ifmissing, thefit.pp routine will be run to obtain values

M

number of superpositions or 'blocks' / 'years' the process corresponds to (can affect the optimization)

nbs

number of simulations used to assess the null distribution of the LRT, and produce the p-value

alpha

significance level of the LRT

plots

vector of strings indicating which plots to produce;LRT= likelihood ratio test,WN = white noise,PS = parameter stability. UseNULL if you do not want plots to be produced

UseQuantiles

logical; use quantiles as the thresholds in the plot?

changepar

logical; ifTRUE, the graphical parameters (via a call topar) are modified.

...

additional parameters passed toplot, overriding defaults including

Details

The function is a wrapper for the univariate (non-homogeneous Poisson process model) and bivariate exponential dependence model.For the latter, the user can select either the rate or inverse rate parameter (the inverse rate parametrization works better for uniformityof the p-value distribution under theLR test.

There are two options for the bivariate diagnostic: either provide pairwise minimum of marginallyexponentially distributed margins or provide an times 2 matrix with the original data, whichis transformed to exponential margins using the empirical distribution function.

Value

plots of the requested diagnostics and an invisible list with components

Author(s)

Jennifer L. Wadsworth

References

Wadsworth, J.L. (2016). Exploiting Structure of Maximum Likelihood Estimators for Extreme Value Threshold Selection,Technometrics,58(1), 116-126,http://dx.doi.org/10.1080/00401706.2014.998345.

Examples

## Not run: set.seed(123)# Parameter stability onlyW.diag(xdat = abs(rnorm(5000)), model = 'nhpp',       k = 30, q1 = 0, plots = "PS")W.diag(rexp(1000), model = 'nhpp', k = 20, q1 = 0)xbvn <- rmnorm(n = 6000,                mu = rep(0, 2),                Sigma = cbind(c(1, 0.7), c(0.7, 1)))# Transform margins to exponential manuallyxbvn.exp <- -log(1 - pnorm(xbvn))#rate parametrizationW.diag(xdat = apply(xbvn.exp, 1, min), model = 'exp',       k = 30, q1 = 0)W.diag(xdat = xbvn, model = 'exp', k = 30, q1 = 0)#inverse rate parametrizationW.diag(xdat = apply(xbvn.exp, 1, min), model = 'invexp',       k = 30, q1 = 0)## End(Not run)

Abisko rainfall

Description

Daily non-zero rainfall measurements in Abisko (Sweden) from January 1913 until December 2014.

Arguments

date

Date of the measurement

precip

rainfall amount (in mm)

Format

a data frame with 15132 rows and two variables

Source

Abisko Scientific Research Station

References

A. Kiriliouk, H. Rootzen, J. Segers and J.L. Wadsworth (2019),Peaks over thresholds modeling with multivariate generalized Pareto distributions, Technometrics,61(1), 123–135, <doi:10.1080/00401706.2018.1462738>


Estimation of the bivariate angular dependence function

Description

Estimation of the bivariate angular dependence function

Usage

adf(  xdat,  qlev = 0.95,  estimator = c("hill", "mle", "bayes"),  level = 0.95,  ties.method = "random",  angles = seq(0, 1, by = 0.02),  plot = TRUE)

Arguments

xdat

ann by2 matrix of multivariate observations

qlev

quantile level on uniform scale at which to threshold data. Default to 0.95

estimator

string indicating the estimation method

level

level for confidence intervals, default to 0.95

ties.method

method for handling of ties in rank transformation

angles

vector of angles at which to evaluate the angular dependence functionThe confidence intervals are based on normal quantiles. The standard errors for thehillare based on the asymptotic covariance and that of themle derived using the delta-method.Bayesian posterior predictive interval estimates are obtained using ratio-of-uniform sampling with flat priors:the shape parameters are constrained to lie within the triangle, as are frequentist point estimateswhich are adjusted post-inference.

plot

logical indicating whether to plot the function, defaults toTRUE

Value

a plot of the angular dependence function ifplot=TRUE, plus an invisible list with components

References

J.L. Wadsworth and J.A. Tawn (2013). A new representation for multivariate tail probabilities,Bernoulli, 19(5B), 2689-2714.

Examples

set.seed(12)dat <- mev::rmev(n = 1000, d = 2, model = "log", param = 0.1)adf(xdat = dat, estimator = 'hill')

Bivariate angular dependence function for extrapolation based on rays

Description

The scale parameterg(w) in the Ledford and Tawn approach is estimated empirically forx large as

\hat{g}(w) = \frac{\Pr(X_P>xw, Y_P>x(1-w))}{\Pr(X_P>x, Y_P>x)}

where the sample (X_P, Y_P) are observations on the unit Pareto scale.The coefficient\eta is estimated using maximum likelihood as theshape parameter of a generalized Pareto distribution on\min(X_P, Y_P).

Usage

angextrapo(xdat, qlev = 0.95, w = seq(0.05, 0.95, length = 20), ...)

Arguments

xdat

ann by2 matrix of observations

qlev

quantile level on uniform scale at which to threshold data. Default to 0.95

w

vector of unique angles between 0 and 1 at which to evaluate scale empirically.

...

additional arguments, for backward compatibility

Value

a list with elements

References

Ledford, A.W. and J. A. Tawn (1996), Statistics for near independence in multivariate extreme values.Biometrika,83(1), 169–187.

Examples

angextrapo(rmev(n = 1000, model = 'log', d = 2, param = 0.5))

Rank-based transformation to angular measure

Description

The method uses the pseudo-polar transformation for suitable norms, transformingthe data to pseudo-observations, than marginally to unit Frechet or unit Pareto.Empirical or Euclidean weights are computed and returned alongside with the angular andradial sample for values above threshold(s)thresh, specified in terms of quantilesof the radial componentR or marginal quantiles. Only complete tuples are kept.

Usage

angmeas(  xdat,  thresh,  Rnorm = c("l1", "l2", "linf"),  Anorm = c("l1", "l2", "linf", "arctan"),  marg = c("frechet", "pareto"),  wgt = c("empirical", "euclidean"),  region = c("sum", "min", "max"),  is.angle = FALSE,  ...)

Arguments

xdat

ann byd sample matrix

thresh

threshold of length 1 for'sum', ord marginal thresholds otherwise.

Rnorm

character string indicating the norm for the radial component.

Anorm

character string indicating the norm for the angular component.arctan is only implemented ford=2

marg

character string indicating choice of marginal transformation, either to Frechet or Pareto scale

wgt

character string indicating weighting function for the equation. Can be based on Euclidean or empirical likelihood for the mean

region

character string specifying which observations to consider (and weight).'sum' corresponds to a radial threshold\sum x_i >thresh,'min' to\min x_i >thresh and'max' to\max x_i >thresh.

is.angle

logical indicating whether observations are already angle with respect toregion. Default toFALSE.

...

additional arguments

Details

The empirical likelihood weighted mean problem is implemented for all thresholds,while the Euclidean likelihood is only supported for diagonal thresholds specifiedviaregion=sum.

Value

a list with argumentsang for thed-1 pseudo-angular sample,rad with the radial componentand possiblywts ifRnorm='l1' and the empirical likelihood algorithm converged. The Euclidean algorithm always returns weights even if some of these are negative.

a list with components

Author(s)

Leo Belzile

References

Einmahl, J.H.J. and J. Segers (2009). Maximum empirical likelihood estimation of the spectral measure of an extreme-value distribution,Annals of Statistics,37(5B), 2953–2989.

de Carvalho, M. and B. Oumow and J. Segers and M. Warchol (2013). A Euclidean likelihood estimator for bivariate tail dependence,Comm. Statist. Theory Methods,42(7), 1176–1192.

Owen, A.B. (2001).Empirical Likelihood, CRC Press, 304p.

Examples

x <- rmev(n = 25, d = 3, param = 0.5, model = 'log')wts <- angmeas(xdat = x, Rnorm = 'l1', Anorm = 'l1', marg = 'frechet', wgt = 'empirical')wts2 <- angmeas(xdat = x, Rnorm = 'l2', Anorm = 'l2', marg = 'pareto')

Dirichlet mixture smoothing of the angular measure

Description

This function computes the empirical or Euclidean likelihoodestimates of the spectral measure and uses the points returned from a call toangmeas to compute the Dirichletmixture smoothing of de Carvalho, Warchol and Segers (2012), placing a Dirichlet kernel at each observation.

Usage

angmeasdir(  xdat,  thresh,  Rnorm = c("l1", "l2", "linf"),  Anorm = c("l1", "l2", "linf", "arctan"),  marg = c("frechet", "pareto"),  wgt = c("empirical", "euclidean"),  region = c("sum", "min", "max"),  is.angle = FALSE,  ...)

Arguments

xdat

ann byd sample matrix

thresh

threshold of length 1 for'sum', ord marginal thresholds otherwise.

Rnorm

character string indicating the norm for the radial component.

Anorm

character string indicating the norm for the angular component.arctan is only implemented ford=2

marg

character string indicating choice of marginal transformation, either to Frechet or Pareto scale

wgt

character string indicating weighting function for the equation. Can be based on Euclidean or empirical likelihood for the mean

region

character string specifying which observations to consider (and weight).'sum' corresponds to a radial threshold\sum x_i >thresh,'min' to\min x_i >thresh and'max' to\max x_i >thresh.

is.angle

logical indicating whether observations are already angle with respect toregion. Default toFALSE.

...

additional arguments

Details

The cross-validationbandwidth is the solution of

\max_{\nu} \sum_{i=1}^n \log \left\{ \sum_{k=1,k \neq i}^n p_{k, -i} f(\mathbf{w}_i; \nu \mathbf{w}_k)\right\},

wheref is the density of the Dirichlet distribution,p_{k, -i} is the Euclidean weightobtained from estimating the Euclidean likelihood problem without observationi.

Value

an invisible list with components

Examples

set.seed(123)x <- rmev(n = 100, d = 2L, param = 0.5, model = 'log')out <- angmeasdir(x)

Automated mean residual life plots

Description

This function implements the automated proposal fromSection 2.2 of Langousis et al. (2016)for mean residual life plots. It returns the thresholdthat minimize the weighted mean square error andmoment estimators for the scale and shape parameterbased on weighted least squares.

Usage

automrl(xdat, kmax, thresh, plot = TRUE, ...)

Arguments

xdat

[numeric] vector of observations

kmax

[integer] maximum number of order statistics

thresh

[numeric] vector of thresholds; if missing, uses all order statistics from the 20th largest untilkmax as candidates

plot

[logical] ifTRUE (default), return a plot of the mean residual life plot with the fitted slopeand the chosen threshold

...

additional arguments, currently ignored

Details

The procedure consists in estimating the usual

Value

a list containing

References

Langousis, A., A. Mamalakis, M. Puliga and R. Deidda (2016).Threshold detection for the generalized Pareto distribution:Review of representative methods and application to theNOAA NCDC daily rainfall database, Water Resources Research,52, 2659–2681.


Censored likelihood for multivariate peaks over threshold models

Description

Censored likelihoods for various parametric limiting models over region determined by

\{y \in F: \max_{j=1}^D \sigma_j \frac{y^\xi_j-1}{\xi_j}+\mu_j > u\};

where\mu isloc,\sigma isscale and\xi isshape.

Usage

clikmgp(  dat,  thresh,  mthresh = thresh,  loc,  scale,  shape,  par,  model = c("log", "neglog", "br", "xstud"),  likt = c("mgp", "pois", "binom"),  lambdau = 1,  ...)

Arguments

dat

matrix of observations

thresh

functional threshold for the maximum

mthresh

vector of individuals thresholds under which observations are censored

loc

vector of location parameter for the marginal generalized Pareto distribution

scale

vector of scale parameter for the marginal generalized Pareto distribution

shape

vector of shape parameter for the marginal generalized Pareto distribution

par

list of parameters:alpha for the logistic model,Lambda for the Brown–Resnick model or elseSigma anddf for the extremal Student.

model

string indicating the model family, one of"log","neglog","br" or"xstud"

likt

string indicating the type of likelihood, with an additional contribution for the non-exceeding components: one of"mgp","binom" and"pois".

lambdau

vector of marginal rate of marginal threshold exceedance.

...

additional arguments (see Details)

Details

Optional arguments can be passed to the function via...

Value

the value of the log-likelihood withattributesexpme, giving the exponent measure

Note

The location and scale parameters are not identifiable unless one of them is fixed.


Confidence intervals for profile likelihood objects

Description

Computes confidence intervals for the parameter psi for profile likelihood objects.This function uses spline interpolation to derivelevel confidence intervals

Usage

## S3 method for class 'eprof'confint(  object,  parm,  level = 0.95,  prob = c((1 - level)/2, 1 - (1 - level)/2),  print = FALSE,  method = c("cobs", "smooth.spline"),  boundary = FALSE,  ...)

Arguments

object

an object of classeprof, normally the output ofgpd.pll orgev.pll.

parm

a specification of which parameters are to be given confidence intervals,either a vector of numbers or a vector of names. If missing, all parameters are considered.

level

confidence level, with default value of 0.95

prob

percentiles, with default giving symmetric 95% confidence intervals

print

should a summary be printed. Default toFALSE.

method

string for the method, eithercobs (constrained robust B-spline from eponym package) orsmooth.spline

boundary

logical; ifTRUE, the null distribution is assumed to be a mixture of a point mass and half a chi-square with one degree of freedom.

...

additional arguments passed to functions. Providing a logicalwarn=FALSE turns off warning messages when the lower or upper confidence interval forpsi are extrapolated beyond the provided calculations.

Value

returns a 2 by 3 matrix containing point estimates, lower and upper confidence intervals based on the likelihood root and modified version thereof


Threshold selection via coefficient of variation

Description

This function computes the empirical coefficient of variation andcomputes a weighted statistic comparing the squared distance withthe theoretical coefficient variation corresponding to a specificshape parameter (estimated from the data using a moment estimatoras the value minimizing the test statistic, or using maximum likelihood).The procedure stops if there are no more than 10 exceedances above thehighest threshold

Usage

cvselect(  xdat,  thresh,  method = c("mle", "wcv", "cv"),  nsim = 999L,  nthresh = 10L,  level = 0.05,  lazy = FALSE)

Arguments

xdat

[vector] vector of observations

thresh

[vector] vector of threshold. If missing, set top^k fork=0 tok=nthresh

method

[string], either moment estimator for the (weighted) coefficient of variation (wcv andcv) or maximum likelihood (mle)

nsim

[integer] number of bootstrap replications

nthresh

[integer] number of thresholds, ifthresh is not supplied by the user

level

[numeric] probability level for sequential testing procedure

lazy

[logical] compute the bootstrap p-value until the test stops rejecting at levellevel? Default toFALSE

Value

a list with elements

References

del Castillo, J. and M. Padilla (2016).Modelling extreme values by the residual coefficient of variation, SORT, 40(2), pp. 303–320.


Distance matrix with geometric anisotropy

Description

The function computes the distance between locations, with geometric anisotropy.Consider real parameters\theta_1 and\theta_2, and the transformation\psi=\arctan(\theta_1/\theta_2)/2 andr=1 +\theta_1^2 + \theta_2^2.The dilation and rotation matrix is

\left(\begin{matrix} \sqrt{r}\cos(\rho) & -\sqrt{r}\sin(\rho) \\ \sin(\rho)/\sqrt{r} & \cos(\rho)/\sqrt{r} \end{matrix} \right).

The parametrization is convenient for optimization purposes, as the parameter vector is unconstrainedand the transformation has unit Jacobian.

Usage

dgeoaniso(loc, theta)

Arguments

loc

ad by 2 matrix of locations giving the coordinates of a site per row.

theta

numeric vector of length 2, real parameters

Value

ad byd square matrix of pairwise distance

References

Rai, K. and Brown, P.E. (2025), A parameter transformation of the anisotropic Matérn covariance function. Canadian Journal of Statistics e11839.doi:10.1002/cjs.11839


Distance matrix with geometric anisotropy

Description

The function computes the distance between locations, with geometric anisotropy.The parametrization assumes there is a scale parameter, say\sigma, so thatscaleis the distortion for the second component only. The anglerho must lie in[-\pi/2, \pi/2]. The dilation and rotation matrix is

\left(\begin{matrix} \cos(\rho) & \sin(\rho) \\ - \sigma\sin(\rho) & \sigma\cos(\rho) \end{matrix} \right)

Usage

distg(loc, scale, rho)

Arguments

loc

ad by 2 matrix of locations giving the coordinates of a site per row.

scale

numeric vector of length 1, greater than 1.

rho

angle for the anisotropy, must be larger than\pi/2 in modulus.

Value

ad byd square matrix of pairwise distance


Extended generalised Pareto families

Description

This function provides the log-likelihood and quantiles for the three different families presentedin Papastathopoulos and Tawn (2013) and the two proposals of Gamet and Jalbert (2022), plus exponential tilting. All of the models contain an additional parameter,\kappa \ge 0.All families share the same tail index as the generalized Pareto distribution, while allowing for lower thresholds.For most models, the distribution reduce to the generalised Pareto when\kappa=1 (for modelsgj-tnorm andlogist, on the boundary of the parameter space when\kappa \to 0).

egp.retlev gives the return levels for the extended generalised Pareto distributions

Arguments

xdat

vector of observations, greater than the threshold

thresh

threshold value

par

parameter vector (\kappa,\sigma,\xi).

model

a string indicating which extended family to fit

show

logical; ifTRUE, print the results of the optimization

p

extreme event probability;p must be greater than the rate of exceedance for the calculation to make sense. SeeDetails.

plot

logical; ifTRUE, a plot of the return levels

Details

For return levels, thep argument can be related toT year exceedances as follows:if there aren_y observations per year, than takepto equal1/(Tn_y) to obtain theT-years return level.

Value

egp.ll returns the log-likelihood value, whileegp.retlev returns a plot of the return levels ifplot=TRUE and a list with tail probabilitiesp, return levelsretlev, thresholdsthresh and model namemodel.

Usage

egp.ll(xdat, thresh, model, par)

egp.retlev(xdat, thresh, par, model, p, plot=TRUE)

Author(s)

Leo Belzile

References

Papastathopoulos, I. and J. Tawn (2013). Extended generalised Pareto models for tail estimation,Journal of Statistical Planning and Inference143(3), 131–143, <doi:10.1016/j.jspi.2012.07.001>.

Gamet, P. and Jalbert, J. (2022). A flexible extended generalized Pareto distribution for tail estimation.Environmetrics,33(6), <doi:10.1002/env.2744>.

Examples

set.seed(123)xdat <- rgp(1000, loc = 0, scale = 2, shape = 0.5)par <- fit.egp(xdat, thresh = 0, model = 'gj-beta')$parp <- c(1/1000, 1/1500, 1/2000)# With multiple thresholdsth <- c(0, 0.1, 0.2, 1)opt <- tstab.egp(xdat, thresh = th, model = 'gj-beta')egp.retlev(xdat = xdat, thresh = th, model = 'gj-beta', p = p)opt <- tstab.egp(xdat, th, model = 'pt-power', plots = NA)egp.retlev(xdat = xdat, thresh = th, model = 'pt-power', p = p)

Extended generalised Pareto log likelihood

Description

This function provides the log-likelihood and quantiles for the different extended generalized Pareto distributions. All families share the same tail index as the generalized Pareto, and reduce to the latter whenkappa=1.

Usage

egp.ll(  xdat,  thresh,  par,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"))egp.retlev(  xdat,  thresh,  par,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"),  p,  confint = FALSE,  plot = TRUE)

Arguments

xdat

vector of observations, greater than the threshold

thresh

threshold value

par

parameter vector (\kappa,\sigma,\xi).

model

a string indicating which extended family to fit

p

extreme event probability;p must be greater than the rate of exceedance for the calculation to make sense. SeeDetails.

plot

logical; ifTRUE, a plot of the return levels


Fit of extended GP models and parameter stability plots

Description

This function is an alias offit.egp.

Usage

egp.fit(  xdat,  thresh,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"),  init,  show = FALSE)

Details

Supported for backward compatibility


Deprecated function for parameter stability plots

Description

Deprecated function for parameter stability plots

Usage

egp.fitrange(  xdat,  thresh,  model = c("egp1", "egp2", "egp3"),  plots = 1:3,  umin,  umax,  nint)

Profile log likelihood for extended generalized Pareto models

Description

Computes the profile log likelihood over a grid of values of\psi for various parameters, including return levels.

Usage

egp.pll(  psi,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"),  param = c("kappa", "scale", "shape", "retlev"),  mle = NULL,  xdat,  thresh = NULL,  plot = FALSE,  method = c("Nelder", "nlminb", "BFGS"),  p,  ...)

Arguments

psi

grid of values for the parameter to profile

model

string; choice of extended eneralized Pareto model.

param

string; parameter to profile

mle

a vector or matrix with maximum likelihood estimates ofkappa,scale,shape. This can be a matrix if there are multiple threshold

xdat

vector of observations

thresh

vector of positive thresholds. IfNULL, defaults to zero.

plot

logical; ifTRUE, returns a plot of the profile log likelihood

method

string giving the optimization method for the outer optimization in the augmented Lagrangian routine; one ofnlminb orBFGS

p

tail probability for return level ifparam="retlev".

...

additional arguments, currently ignored

Value

an object of classeprof


Fit an extended generalized Pareto distribution of Naveau et al.

Description

Deprecated function name to fit an extended generalized Pareto family. The user should callfit.extgp instead.

Usage

egp2.fit(  data,  model = 1,  method = c("mle", "pwm"),  init,  censoring = c(0, Inf),  rounded = 0,  CI = FALSE,  R = 1000,  ncpus = 1,  plots = TRUE)

Arguments

data

data vector.

model

integer ranging from 0 to 4 indicating the model to select (seeextgp).

method

string; either'mle' for maximum likelihood, or'pwm' for probability weighted moments, or both.

init

vector of initial values, comprising ofp,\kappa,\delta,\sigma,\xi (in that order) for the optimization. All parameters may not appear depending onmodel.

censoring

numeric vector of length 2 containing the lower and upper bound for censoring;censoring=c(0,Inf) is equivalent to no censoring.

rounded

numeric giving the instrumental precision (and rounding of the data), with default of 0.

R

integer; number of bootstrap replications.

ncpus

integer; number of CPUs for parallel calculations (default: 1).

plots

logical; whether to produce histogram and density plots.

See Also

fit.extgp


Extended generalized Pareto distribution

Description

Density function, distribution function, quantile function andrandom number generation for various extended generalized Paretodistributions

Usage

pegp(  q,  scale,  shape,  kappa,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"),  lower.tail = TRUE,  log.p = FALSE)degp(  x,  scale,  shape,  kappa,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"),  log = FALSE)qegp(  p,  scale,  shape,  kappa,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"),  lower.tail = TRUE,  log.p = FALSE)regp(  n,  scale,  shape,  kappa,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"))

Arguments

scale

scale parameter, strictly positive.

shape

shape parameter.

kappa

shape parameter for the tilting distribution.

model

string giving the distribution of the model

lower.tail

logical; ifTRUE (default), the lower tail probability\Pr(X \leq x) is returned.

log.p,log

logical; ifFALSE (default), values are returned on the probability scale.

x,q

vector of quantiles

p

vector of probabilities

n

scalar number of observations

References

Papastathopoulos, I. and J. Tawn (2013). Extended generalised Pareto models for tail estimation,Journal of Statistical Planning and Inference143(3), 131–143, <doi:10.1016/j.jspi.2012.07.001>.

Gamet, P. and Jalbert, J. (2022). A flexible extended generalized Pareto distribution for tail estimation.Environmetrics,33(6), <doi:10.1002/env.2744>.


Self-concordant empirical likelihood for a vector mean

Description

Self-concordant empirical likelihood for a vector mean

Usage

emplik(  dat,  mu = rep(0, ncol(dat)),  lam = rep(0, ncol(dat)),  eps = 1/nrow(dat),  M = 1e+30,  thresh = 1e-30,  itermax = 100)

Arguments

dat

n byd matrix ofd-variate observations

mu

d vector of hypothesized mean ofdat

lam

starting values for Lagrange multiplier vector, default to zero vector

eps

lower cutoff for-\log, with default1/nrow(dat)

M

upper cutoff for-\log.

thresh

convergence threshold for log likelihood (default of1e-30 is aggressive)

itermax

upper bound on number of Newton steps.

Value

a list with components

Author(s)

Art Owen,C++ port by Leo Belzile

References

Owen, A.B. (2013). Self-concordance for empirical likelihood,Canadian Journal of Statistics,41(3), 387–397.


Eskdalemuir Observatory Daily Rainfall

Description

This dataset contains exceedances of 30mm for dailycumulated rainfall observations over the period 1970-1986.These data were aggregated from hourly series.

Format

a vector with 93 daily cumulated rainfall measurements exceeding 30mm.

Details

The station is one of the rainiest of the whole UK, with an average 1554m of cumulated rainfall per year.The data consisted of 6209 daily observations, of which 4409 were non-zero.Only the 93 largest observations are provided.

Source

Met Office.


Exponent measure for multivariate generalized Pareto distributions

Description

Integrated intensity over the region defined by[0, z]^c for logistic, Huesler-Reiss, Brown-Resnick and extremal Student processes.

Usage

expme(  z,  par,  model = c("log", "neglog", "hr", "br", "xstud"),  method = c("TruncatedNormal", "mvtnorm", "mvPot"))

Arguments

z

vector at which to estimate exponent measure

par

list of parameters

model

string indicating the model family

method

string indicating the package from which to extract the numerical integration routine

Value

numeric giving the measure of the complement of[0,z].

Note

The listpar must contain different arguments depending on the model. For the Brown–Resnick model, the user must supply the conditionally negative definite matrixLambda following the parametrization in Engelkeet al. (2015) or the covariance matrixSigma, following Wadsworth and Tawn (2014). For the Husler–Reiss model, the user provides the mean and covariance matrix,m andSigma. For the extremal student, the covariance matrixSigma and the degrees of freedomdf. For the logistic model, the strictly positive dependence parameteralpha.

Examples

## Not run: # Extremal StudentSigma <- stats::rWishart(n = 1, df = 20, Sigma = diag(10))[, , 1]expme(z = rep(1, ncol(Sigma)), par = list(Sigma = cov2cor(Sigma), df = 3), model = "xstud")# Brown-Resnick modelD <- 5Lloc <- cbind(runif(D), runif(D))di <- as.matrix(dist(rbind(c(0, ncol(loc)), loc)))semivario <- function(d, alpha = 1.5, lambda = 1) {  (d / lambda)^alpha}Vmat <- semivario(di)Lambda <- Vmat[-1, -1] / 2expme(z = rep(1, ncol(Lambda)), par = list(Lambda = Lambda), model = "br", method = "mvPot")Sigma <- outer(Vmat[-1, 1], Vmat[1, -1], "+") - Vmat[-1, -1]expme(z = rep(1, ncol(Lambda)), par = list(Lambda = Lambda), model = "br", method = "mvPot")## End(Not run)

Extremal index estimators

Description

The function implements estimators of the extremal index based on interexceedance time and gap of exceedances. The maximum likelihood estimator and iteratively reweighted leastsquare estimators of Suveges (2007) as well as the intervals estimator. The implementationdiffers from the presentation of the paper in that an iteration limit is enforced to make surethe iterative procedure terminates. Multiple thresholds can be supplied.

Usage

ext.index(  xdat,  quantile = 0.95,  method = c("wls", "mle", "intervals"),  plot = FALSE,  warn = FALSE,  ...)

Arguments

xdat

numeric vector of observations

quantile

a vector of quantile levels in (0,1) for estimation of the extremal index. Defaults to 0.95

method

a string specifying the chosen method. Must be eitherwlsfor weighted least squares,mle for maximum likelihood estimation orintervalsfor the intervals estimator of Ferro and Segers (2003). Partial match is allowed.

plot

logical; ifTRUE, plot the extremal index as a function ofq

warn

logical; ifTRUE, receive a warning when the sample size is too small

...

additional arguments, for backward compatibility

Details

The iteratively reweighted least square is a procedure based on the gaps of exceedancesS_n=T_n-1The model is first fitted to non-zero gaps, which are rescaled to have unit exponential scale. The slopebetween the theoretical quantiles and the normalized gap of exceedances isb=1/\theta,with intercepta=\log(\theta)/\theta.As such, the estimate of the extremal index is based on\hat{\theta}=\exp(\hat{a}/\hat{b}).The weights are chosen in such a way as to reduce the influence of the smallest values.The estimator exploits the dual role of\theta as the parameter of the mean forthe interexceedance time as well as the mixture proportion for the non-zero component.

The maximum likelihood is based on an independence likelihood for the rescaled gap of exceedances,namely\bar{F}(u_n)S(u_n). The score equation is equivalent to a quadratic equation in\theta and the maximum likelihood estimate is available in closed form.Its validity requires however conditionD^{(2)}(u_n) to apply;this should be checked by the user beforehand.

A warning is emitted if the effective sample size is less than 50 observations.

Value

a vector or matrix of estimated extremal index of dimensionlength(method) bylength(q).

Author(s)

Leo Belzile

References

Ferro and Segers (2003). Inference for clusters of extreme values,JRSS: Series B,65(2), 545-556.

Suveges (2007) Likelihood estimation of the extremal index.Extremes,10(1), 41-55.

Suveges and Davison (2010), Model misspecification in peaks over threshold analysis.Annals of Applied Statistics,4(1), 203-221.

Fukutome, Liniger and Suveges (2015), Automatic threshold and run parameter selection: a climatologyfor extreme hourly precipitation in Switzerland.Theoretical and Applied Climatology,120(3), 403-416.

Examples

## Not run: set.seed(234)# Moving maxima model with theta=0.5a <- 1; theta <-  1/(1+a)sim <- rgev(10001, loc=1/(1+a),scale=1/(1+a),shape=1)x <- pmax(sim[-length(sim)]*a,sim[-1])q <- seq(0.9, 0.99, by = 0.01)ext.index(xdat = x, quantile = q, method = c ('wls', 'mle'))## End(Not run)

Estimators of the extremal coefficient

Description

These functions estimate the extremal coefficient using an approximatesample from the Frechet distribution.

Usage

extcoef(  xdat,  coord = NULL,  thresh = NULL,  estimator = c("schlather", "smith", "fmado"),  margtrans = c("emp", "gev", "none"),  ties.method = "random",  prob = 0,  plot = TRUE,  ...)

Arguments

xdat

ann byD matrix of unit Frechet observations

coord

an optionald byD matrix of location coordinates

thresh

threshold parameter (default is to keep all data ifprob = 0).

estimator

string indicating which estimator to compute, one ofsmith,schlather orfmado.

margtrans

string indicating which method to use to transform the margins to unit Frechet scale, eitheremp for nonparametric transformation via rank transform,gev for fit of generalized extreme value distribution to marginals, ornone

ties.method

method for handling of ties in rank transformation

prob

probability of not exceeding thresholdthresh

plot

logical; should cloud or matrix of pairwise empirical estimates be plotted? Default toTRUE.

...

additional parameters passed to the function, currently ignored.

Details

TheSmith estimator: supposeZ(x) is simple max-stable vector (i.e., with unit Frechet marginals). Then1/Z is unit exponential and1/\max(Z(s_1), Z(s_2))is exponential with rate\theta = \max\{Z(s_1), Z(s_2)\}.The extremal index for the pair can therefore be calculated usingthe reciprocal mean.

TheSchlather and Tawn estimator: the likelihood of the naive estimator for a pairof two sitesA is

\mathrm{card}\left\{ j: \max_{i \in A} X_i^{(j)}\bar{X}_i)>z \right\}\log(\theta_A) - \theta_A \sum_{j=1}^n \left[ \max \left\{z, \max_{i \in A}(X_i^{(j)}\bar{X}_i)\right\}\right]^{-1},

where\bar{X}_i = n^{-1} \sum_{j=1}^n 1/X_i^{(j)} is the harmonic mean andzis a threshold on the unit Frechet scale.The search for the maximum likelihood estimate for every pairAis restricted to the interval[1,3].A binned version of the extremal coefficient cloud is also returned.The Schlather estimator is not self-consistent.The Schlather and Tawn estimator includes as special casethe Smith estimator if we do not censor the data (p = 0)and do not standardize observations by their harmonic mean.

TheF-madogram estimator is a non-parametric estimate based on a stationary processZ; the extremal coefficient satisfies

\theta(h)=\frac{1+2\nu(h)}{1-2\nu(h)},

where

\nu(h) = \frac{1}{2} \mathsf{E}[|F(Z(s+h)-F(Z(s))|]

The implementation only uses complete pairs to calculate the relative ranks.

All estimators are coded in plain R and computations are not optimized.The estimation time can therefore be large for large data sets.If there are no missing observations, the routinefmadogramfrom theSpatialExtremes package should be preferred as it isnoticeably faster.

The data will typically consist of max-stable vectors or block maxima.Both of the Smith and the Schlather–Tawn estimators require unitFrechet margins; the margins will be standardizedto the unit Frechet scale, either parametrically or nonparametrically unlessmargtrans="none".Ifmargtrans = "gev", a parametric GEV model is fitted to each column ofdat using maximum likelihoodestimation and transformed back using the probability integral transform. Ifmethod = "emp",using the empirical distribution function. The latter is the default, as it is appreciably faster.

Value

an invisible list with vectorsdist ifcoord is non-null or else a matrix of pairwise indicesind,extcoef and the suppliedestimator,fmado andbinned. Ifestimator == "schlather", an additional matrix with 2 columns containing the binned distancebinned with theh and the binned extremal coefficient.

References

Schlather, M. and J. Tawn (2003). A dependence measure for multivariate and spatial extremes,Biometrika,90(1), pp.139–156.

Cooley, D., P. Naveau and P. Poncet (2006). Variograms for spatial max-stable random fields, In: Bertail P., Soulier P., Doukhan P. (eds)Dependence in Probability and Statistics. Lecture Notes in Statistics, vol. 187. Springer, New York, NY

R. J. Erhardt, R. L. Smith (2012), Approximate Bayesian computing for spatial extremes,Computational Statistics and Data Analysis,56, pp.1468–1481.

Examples

## Not run: coord <- 10 * cbind(runif(50), runif(50))di <- as.matrix(dist(coord))dat <- rmev(  n = 1000,  d = 100,  param = 3,  sigma = exp(-di / 2),  model = 'xstud')res <- extcoef(xdat = dat, coord = coord)# Extremal Student extremal coefficient functionXT.extcoeffun <- function(h, nu, corrfun, ...) {  if (!is.function(corrfun)) {    stop('Invalid function \"corrfun\".')  }  h <- unique(as.vector(h))  rhoh <- sapply(h, corrfun, ...)  cbind(    h = h,    extcoef = 2 * pt(sqrt((nu + 1) * (1 - rhoh) / (1 + rhoh)), nu + 1)  )}#This time, only one graph with theoretical extremal coefplot(res$dist, res$extcoef, ylim = c(1, 2), pch = 20)abline(v = 2, col = 'gray')extcoefxt <- XT.extcoeffun(  seq(0, 10, by = 0.1),  nu = 3,  corrfun = function(x) {    exp(-x / 2)  })lines(  extcoefxt[, 'h'],  extcoefxt[, 'extcoef'],  type = 'l',  col = 'blue',  lwd = 2)# Brown--Resnick extremal coefficient functionBR.extcoeffun <- function(h, vario, ...) {  if (!is.function(vario)) {    stop('Invalid function \"vario\".')  }  h <- unique(as.vector(h))  gammah <- sapply(h, vario, ...)  cbind(h = h, extcoef = 2 * pnorm(sqrt(gammah / 4)))}extcoefbr <- BR.extcoeffun(  seq(0, 20, by = 0.25),  vario = function(x) {    2 * x^0.7  })lines(  extcoefbr[, 'h'],  extcoefbr[, 'extcoef'],  type = 'l',  col = 'orange',  lwd = 2)coord <- 10 * cbind(runif(20), runif(20))di <- as.matrix(dist(coord))dat <- rmev(  n = 1000,  d = 20,  param = 3,  sigma = exp(-di / 2),  model = 'xstud')res <- extcoef(  xdat = dat,  coord = coord,  estimator = "smith")## End(Not run)

Extended generalised Pareto families of Naveau et al. (2016)

Description

Density function, distribution function, quantile function and random generation for the extended generalized Pareto distribution (GPD) with scale and shape parameters.

Arguments

q

vector of quantiles

x

vector of observations

p

vector of probabilities

n

sample size

prob

mixture probability for modeltype4

kappa

shape parameter fortype1,3 and4

delta

additional parameter fortype2,3 and4

sigma

scale parameter

xi

shape parameter

type

integer between 0 to 5 giving the model choice

step

function of step size for discretization with default0, corresponding to continuous quantiles

log

logical; should the log-density be returned (default toFALSE)?

unifsamp

sample of uniform; if provided, the data will be used in place of new uniform random variates

censoring

numeric vector of length 2 containing the lower and upper bound for censoring

Details

The extended generalized Pareto families proposed in Naveauet al. (2016)retain the tail index of the distribution while being compliant with the theoretical behavior of extremelow rainfall. There are five proposals, the first one being equivalent to the GP distribution.

Usage

pextgp(q, prob=NA, kappa=NA, delta=NA, sigma=NA, xi=NA, type=1)

dextgp(x, prob=NA, kappa=NA, delta=NA, sigma=NA, xi=NA, type=1, log=FALSE)

qextgp(p, prob=NA, kappa=NA, delta=NA, sigma=NA, xi=NA, type=1)

rextgp(n, prob=NA, kappa=NA, delta=NA, sigma=NA, xi=NA, type=1, unifsamp=NULL, censoring=c(0,Inf))

Author(s)

Raphael Huser and Philippe Naveau

References

Naveau, P., R. Huser, P. Ribereau, and A. Hannart (2016), Modeling jointly low, moderate, and heavy rainfall intensities without a threshold selection,Water Resour. Res., 52, 2753-2769,doi:10.1002/2015WR018552.


Carrier distribution for the extended GP distributions of Naveau et al.

Description

Density, distribution function, quantile function and random numbergeneration for the carrier distributions of the extended Generalized Pareto distributions.

Arguments

u

vector of observations (dextgp.G), probabilities (qextgp.G) or quantiles (pextgp.G), in[0,1]

prob

mixture probability for modeltype4

kappa

shape parameter fortype1,3 and4

delta

additional parameter fortype2,3 and4

type

integer between 0 to 5 giving the model choice

log

logical; should the log-density be returned (default toFALSE)?

n

sample size

unifsamp

sample of uniform; if provided, the data will be used in place of new uniform random variates

censoring

numeric vector of length 2 containing the lower and upper bound for censoring

direct

logical; which method to use for sampling in model oftype4?

Usage

pextgp.G(u, type=1, prob, kappa, delta)

dextgp.G(u, type=1, prob=NA, kappa=NA, delta=NA, log=FALSE)

qextgp.G(u, type=1, prob=NA, kappa=NA, delta=NA)

rextgp.G(n, prob=NA, kappa=NA, delta=NA,type=1, unifsamp=NULL, direct=FALSE, censoring=c(0,1))

Author(s)

Raphael Huser and Philippe Naveau

See Also

extgp


Pairwise extremogram for max-risk functional

Description

The function computes the pairwisechi estimates and plots them as a function of the distance between sites. The function also includes utilities for geometric anisotropy.

Usage

extremo(xdat, margp, coord, scale = 1, rho = 0, plot = FALSE, ...)

Arguments

xdat

data matrix

margp

marginal probability above which to threshold observations

coord

matrix of coordinates (one site per row)

scale

geometric anisotropy scale parameter

rho

geometric anisotropy angle parameter

plot

logical; should a graph of the pairwise estimates against distance? Default toFALSE

...

additional arguments passed to plot

Value

an invisible matrix with pairwise estimates of chi along with distance (unsorted)

Examples

## Not run: lon <- seq(650, 720, length = 10)lat <- seq(215, 290, length = 10)# Create a gridgrid <- expand.grid(lon,lat)coord <- as.matrix(grid)dianiso <- distg(coord, 1.5, 0.5)sgrid <- scale(grid, scale = FALSE)# Specify marginal parameters `loc` and `scale` over grideta <- 26 + 0.05*sgrid[,1] - 0.16*sgrid[,2]tau <- 9 + 0.05*sgrid[,1] - 0.04*sgrid[,2]# Parameter matrix of Huesler--Reiss# associated to power variogramLambda <- ((dianiso/30)^0.7)/4# Regular Euclidean distance between sitesdi <- distg(coord, 1, 0)# Simulate generalized max-Pareto fieldset.seed(345)simu1 <- rgparp(n = 1000, thresh = 50, shape = 0.1, riskf = "max",                scale = tau, loc = eta, sigma = Lambda, model = "hr")extdat <- extremo(dat = simu1, margp = 0.98, coord = coord,                  scale = 1.5, rho = 0.5, plot = TRUE)# Constrained optimization# Minimize distance between extremal coefficient from fitted variogrammindistpvario <- function(par, emp, coord){alpha <- par[1]; if(!isTRUE(all(alpha > 0, alpha < 2))){return(1e10)}scale <- par[2]; if(scale <= 0){return(1e10)}a <- par[3]; if(a<1){return(1e10)}rho <- par[4]; if(abs(rho) >= pi/2){return(1e10)}semivariomat <- power.vario(distg(coord, a, rho), alpha = alpha, scale = scale)  sum((2*(1-pnorm(sqrt(semivariomat[lower.tri(semivariomat)]/2))) - emp)^2)}hin <- function(par, ...){  c(1.99-par[1], -1e-5 + par[1],    -1e-5 + par[2],    par[3]-1,    pi/2 - par[4],    par[4]+pi/2)  }opt <- alabama::auglag(  par = c(0.7, 30, 1, 0),  hin = hin,  control.outer = list(trace = 0),  fn = function(par){   mindistpvario(par, emp = extdat$prob, coord = coord)})stopifnot(opt$kkt1, opt$kkt2)# Plotting the extremogram in the deformed spacedistfa <- distg(loc = coord, opt$par[3], opt$par[4])plot( x = c(distfa[lower.tri(distfa)]), y = extdat$prob, pch = 20, yaxs = "i", xaxs = "i", bty = 'l', xlab = "distance", ylab= "cond. prob. of exceedance", ylim = c(0,1))lines(  x = (distvec <- seq(0,200, length = 1000)),  y = 2*(1-pnorm(sqrt(power.vario(distvec, alpha = opt$par[1],                               scale = opt$par[2])/2))),  col = 2,  lwd = 2)## End(Not run)

Parameter stability plot and maximum likelihood routine for extended GP models

Description

The functiontstab.egp provides classical threshold stability plot for (\kappa,\sigma,\xi).The fitted parameter values are displayed with pointwise normal 95% confidence intervals.The function returns an invisible list with parameter estimates and standard errors, and p-values for the Wald test that\kappa=1.The plot is for the modified scale (as in the generalised Pareto model) and as such it is possible that the modified scale be negative.tstab.egp can also be used to fit the model to multiple thresholds.

Usage

fit.egp(  xdat,  thresh = 0,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"),  start = NULL,  method = c("Nelder", "nlminb", "BFGS"),  fpar = NULL,  show = FALSE,  ...)

Arguments

xdat

vector of observations, greater than the threshold

thresh

threshold value

model

a string indicating which extended family to fit

start

optional named list of initial values, with\kappa,sigma orxi.

method

the method to be used. SeeDetails. Can be abbreviated.

fpar

a named list with fixed parameters, eitherscale orshape

show

logical; ifTRUE, print the results of the optimization

...

additional parameters, for backward compatibility purposes

Details

fit.egp is a numerical optimization routine to fit the extended generalised Pareto models of Papastathopoulos and Tawn (2013),using maximum likelihood estimation.

Value

fit.egp outputs the list returned byoptim, which contains the parameter values, the hessian and in addition the standard errors

tstab.egp returns a plot(s) of the parameters fit over the range of provided thresholds, with pointwise normal confidence intervals; the function also returns an invisible list containing notably the matrix of point estimates (par) and standard errors (se).

Author(s)

Leo Belzile

References

Papastathopoulos, I. and J. Tawn (2013). Extended generalised Pareto models for tail estimation,Journal of Statistical Planning and Inference143(3), 131–143.

Examples

xdat <- mev::rgp(  n = 100,  loc = 0,  scale = 1,  shape = 0.5)fitted <- fit.egp(  xdat = xdat,  thresh = 1,  model = "pt-gamma",  show = TRUE)thresh <- mev::qgp(seq(0.1, 0.5, by = 0.05), 0, 1, 0.5)tstab.egp(   xdat = xdat,   thresh = thresh,   model = "pt-gamma")xdat <- regp(  n = 100,  scale = 1,  shape = 0.1,  kappa = 0.5,  model = "pt-power")fit.egp( xdat = xdat, model = "pt-power", show = TRUE, fpar = list(kappa = 1), method = "Nelder")

Fit an extended generalized Pareto distribution of Naveau et al.

Description

This is a wrapper function to obtain PWM or MLE estimates forthe extended GP models of Naveau et al. (2016) for rainfall intensities. The function calculates confidence intervalsby means of nonparametric percentile bootstrap and returns histograms and QQ plots ofthe fitted distributions. The function handles both censoring and rounding.

Usage

fit.extgp(  data,  model = 1,  method = c("mle", "pwm"),  init,  censoring = c(0, Inf),  rounded = 0,  confint = FALSE,  R = 1000,  ncpus = 1,  plots = TRUE)

Arguments

data

data vector.

model

integer ranging from 0 to 4 indicating the model to select (seeextgp).

method

string; either'mle' for maximum likelihood, or'pwm' for probability weighted moments, or both.

init

vector of initial values, comprising ofp,\kappa,\delta,\sigma,\xi (in that order) for the optimization. All parameters may not appear depending onmodel.

censoring

numeric vector of length 2 containing the lower and upper bound for censoring;censoring=c(0,Inf) is equivalent to no censoring.

rounded

numeric giving the instrumental precision (and rounding of the data), with default of 0.

confint

logical; should confidence interval be returned (percentile bootstrap).

R

integer; number of bootstrap replications.

ncpus

integer; number of CPUs for parallel calculations (default: 1).

plots

logical; whether to produce histogram and density plots.

Details

The different models include the following transformations:

Author(s)

Raphael Huser and Philippe Naveau

References

Naveau, P., R. Huser, P. Ribereau, and A. Hannart (2016), Modeling jointly low, moderate, and heavy rainfall intensities without a threshold selection,Water Resour. Res., 52, 2753-2769,doi:10.1002/2015WR018552.

See Also

egp.fit,egp,extgp

Examples

## Not run: data(rain, package = "ismev")fit.extgp(  rain[rain > 0],  model = 1,  method = 'mle',  init = c(0.9, fit.gpd(rain)$est),  rounded = 0.1,  confint = TRUE,  R = 20)## End(Not run)

Maximum likelihood estimation for the generalized extreme value distribution

Description

This function returns an object of classmev_gev, with default methods for printing and quantile-quantile plots.The default starting values are the solution of the probability weighted moments.

Usage

fit.gev(  xdat,  start = NULL,  method = c("nlminb", "BFGS"),  show = FALSE,  fpar = NULL,  warnSE = FALSE)

Arguments

xdat

a numeric vector of data to be fitted.

start

named list of starting values

method

string indicating the outer optimization routine for the augmented Lagrangian. One ofnlminb orBFGS.

show

logical; ifTRUE (the default), print details of the fit.

fpar

a named list with optional fixed componentsloc,scale andshape

warnSE

logical; ifTRUE, a warning is printed if the standard errors cannot be returned from the observed information matrix when the shape is less than -0.5.

Value

a list containing the following components:

Examples

xdat <- mev::rgev(n = 100)fit.gev(xdat, show = TRUE)# Example with fixed parameterfit.gev(xdat, show = TRUE, fpar = list(shape = 0))

Maximum likelihood estimation for the generalized Pareto distribution

Description

Numerical optimization of the generalized Pareto distribution fordata exceedingthreshold.This function returns an object of classmev_gpd, with default methods for printing and quantile-quantile plots.

Usage

fit.gpd(  xdat,  threshold = 0,  method = "Grimshaw",  show = FALSE,  MCMC = NULL,  k = 4,  tol = 1e-08,  fpar = NULL,  warnSE = FALSE)

Arguments

xdat

a numeric vector of data to be fitted.

threshold

the chosen threshold.

method

the method to be used. SeeDetails. Can be abbreviated.

show

logical; ifTRUE (the default), print details of the fit.

MCMC

NULL for frequentist estimates, otherwise a boolean or a list with parameters passed. IfTRUE, runs a Metropolis-Hastings sampler to get posterior mean estimates. Can be used to pass argumentsniter,burnin andthin to the sampler as a list.

k

bound on the influence function (method = "obre"); the constantk is a robustness parameter(higher bounds are more efficient, low bounds are more robust). Default to 4, must be larger than\sqrt{2}.

tol

numerical tolerance for OBRE weights iterations (method = "obre"). Default to1e-8.

fpar

a named list with fixed parameters, eitherscale orshape

warnSE

logical; ifTRUE, a warning is printed if the standard errors cannot be returned from the observed information matrix when the shape is less than -0.5.

Details

The default method is'Grimshaw', which maximizes the profile likelihood for the ratio scale/shape. Other options include'obre' for optimalB-robust estimator of the parameter of Dupuis (1998), vanilla maximization of the log-likelihood using constrained optimization routine'auglag', 1-dimensional optimization of the profile likelihood usingnlm andoptim. Method'ismev' performs the two-dimensional optimization routinegpd.fit from theismev library, with in addition the algebraic gradient.The approximate Bayesian methods ('zs' and'zhang') are extracted respectively from Zhang and Stephens (2009) and Zhang (2010) and consists of a approximate posterior mean calculated via importancesampling assuming a GPD prior is placed on the parameter of the profile likelihood.

Value

Ifmethod is neither'zs' nor'zhang', a list containing the following components:

Additionally, ifmethod = "obre", a vector of OBREweights.

Otherwise, a list containing

and in addition if MCMC is neitherFALSE, norNULL

Note

Some of the internal functions (which are hidden from the user) allow for modelling of the parameters using covariates. This is not currently implemented withingp.fit, but users can call internal functions should they wish to use these features.

Author(s)

Scott D. Grimshaw for theGrimshaw option. Paul J. Northrop and Claire L. Coleman for the methodsoptim,nlm andismev.J. Zhang and Michael A. Stephens (2009) and Zhang (2010) for thezs andzhang approximate methods and L. Belzile for methodsauglag andobre, the wrapper and MCMC samplers.

Ifshow = TRUE, the optimalB robust estimated weights for the largest observations are printed alongside with thep-value of the latter, obtained from the empirical distribution of the weights. This diagnostic can be used to guide threshold selection:small weights for ther-largest order statistics indicate that the robust fit is driven by the lower tailand that the threshold should perhaps be increased.

References

Davison, A.C. (1984). Modelling excesses over high thresholds, with an application, inStatistical extremes and applications, J. Tiago de Oliveira (editor), D. Reidel Publishing Co., 461–482.

Grimshaw, S.D. (1993). Computing Maximum Likelihood Estimates for the GeneralizedPareto Distribution,Technometrics,35(2), 185–191.

Northrop, P.J. and C. L. Coleman (2014). Improved threshold diagnostic plots for extreme valueanalyses,Extremes,17(2), 289–303.

Zhang, J. (2010). Improving on estimation for the generalized Pareto distribution,Technometrics52(3), 335–339.

Zhang, J. and M. A. Stephens (2009). A new and efficient estimation method for the generalized Pareto distribution.Technometrics51(3), 316–325.

Dupuis, D.J. (1998). Exceedances over High Thresholds: A Guide to Threshold Selection,Extremes,1(3), 251–261.

See Also

fpot andgpd.fit

Examples

data(eskrain)fit.gpd(eskrain, threshold = 35, method = 'Grimshaw', show = TRUE)fit.gpd(eskrain, threshold = 30, method = 'zs', show = TRUE)

Maximum likelihood estimation of the point process of extremes

Description

Data abovethreshold is modelled using the limiting point processof extremes.

Usage

fit.pp(  xdat,  threshold = 0,  npp = 1,  np = NULL,  method = c("nlminb", "BFGS"),  start = NULL,  show = FALSE,  fpar = NULL,  warnSE = FALSE)

Arguments

xdat

a numeric vector of data to be fitted.

threshold

the chosen threshold.

npp

number of observation per period. SeeDetails

np

number of periods of data, ifxdat only contains exceedances.

method

the method to be used. SeeDetails. Can be abbreviated.

start

named list of starting values

show

logical; ifTRUE (the default), print details of the fit.

fpar

a named list with optional fixed componentsloc,scale andshape

warnSE

logical; ifTRUE, a warning is printed if the standard errors cannot be returned from the observed information matrix when the shape is less than -0.5.

Details

The parameternpp controls the frequency of observations.If data are recorded on a daily basis, using a value ofnpp = 365.25yields location and scale parameters that correspond to those of thegeneralized extreme value distribution fitted to block maxima.

Value

a list containing the following components:

References

Coles, S. (2001), An introduction to statistical modelling of extreme values. Springer : London, 208p.

Examples

data(eskrain)pp_mle <- fit.pp(eskrain, threshold = 30, np = 6201)plot(pp_mle)

Estimator of the second order tail index parameter

Description

Estimator of the second order tail index parameter

Usage

fit.rho(xdat, k, method = c("fagh", "dk", "ghp", "gbw"), ...)

Arguments

xdat

vector of positive observations

k

number of highest order statistics to use for estimation

method

string for the estimator

...

additional arguments passed to individual routinescurrently ignored.

Examples

# Example with rho = -0.2n <- 1000xdat <- mev::rgp(n = n, shape = 0.2)kmin <- floor(n^0.995)kmax <- ceiling(n^0.999)rho_est <- fit.rho(   xdat = xdat,   k = n - kmin:kmax)rho_med <- mean(rho_est$rho)

Maximum likelihood estimates of point process for the r-largest observations

Description

This uses a constrained optimization routine to return the maximum likelihood estimatebased on ann byr matrix of observations. Observations should be ordered, i.e.,ther-largest should be in the last column.

Usage

fit.rlarg(  xdat,  start = NULL,  method = c("nlminb", "BFGS"),  show = FALSE,  fpar = NULL,  warnSE = FALSE)

Arguments

xdat

a numeric vector of data to be fitted.

start

named list of starting values

method

the method to be used. SeeDetails. Can be abbreviated.

show

logical; ifTRUE (the default), print details of the fit.

fpar

a named list with fixed parameters, eitherscale orshape

warnSE

logical; ifTRUE, a warning is printed if the standard errors cannot be returned from the observed information matrix when the shape is less than -0.5.

Value

a list containing the following components:

Examples

xdat <- rrlarg(n = 10, loc = 0, scale = 1, shape = 0.1, r = 4)fit.rlarg(xdat)

Shape parameter estimates

Description

Wrapper to estimate the tail index or shape parameter of an extreme value distribution. Each function has similar sets of arguments, a vector or scalar number of order statisticsk anda vector of positive observationsxdat. Themethod argument allows users to choose between different indicators, including the Hill estimator (hill, for positive observations and shape only), the moment estimator of Dekkers and de Haan (mom ordekkers), the de Vries estimator of de Haan and Peng (vries), the generalized jackknife estimator of Gomes et al. (genjack), the Beirlant, Vynckier and Teugels generalized quantile estimator (bvt orgenquant), the Pickands estimator (pickands), the extremeU-statistics estimator of Oorschot, Segers and Zhou (osz), or the exponential rgression model of Beirlant et al. (erm).

Usage

fit.shape(  xdat,  k,  method = c("hill", "rbm", "osz", "vries", "genjack", "mom", "dekkers", "genquant",    "pickands", "erm"),  ...)

Arguments

xdat

vector of positive observations of lengthn

k

number of largest order statistics

method

estimation method.

...

additional parameters passed to functions

Value

a data frame with the number of order statisticsk and the shape parameter estimateshape, or a single numeric value ifk is a scalar.


Maximum likelihood estimation for weighted generalized Pareto distribution

Description

Weighted maximum likelihood estimation, with user-specified vector of weights.

Usage

fit.wgpd(xdat, threshold = 0, weightfun = Stein_weights, start = NULL, ...)

Arguments

xdat

vector of observations

threshold

numeric, value of the threshold

weightfun

function whose first argument is the length of the weight vector

start

optional vector of scale and shape parameters for the optimization routine, defaults toNULL

...

additional arguments passed to the weighting functionweightfun

Value

a list with components


French wind data

Description

Daily mean wind speed (in km/h) at four stations in the south of France, namely Cap Cepet (S1), Lyon St-Exupery (S2), Marseille Marignane (S3) and Montelimar (S4).The data includes observations from January 1976 until April 2023; days containing missing values are omitted.

Format

A data frame with 17209 observations and 8 variables:

date

date of measurement

S1

wind speed (in km/h) at Cap Cepet

S2

wind speed (in km/h) at Lyon Saint-Exupery

S3

wind speed (in km/h) at Marseille Marignane

S4

wind speed (in km/h) at Montelimar

H2

humidity (in percentage) at Lyon Saint-Exupery

T2

mean temperature (in degree Celcius) at Lyon Saint-Exupery

Themetadata attribute includes latitude and longitude (in degrees, minutes, seconds), altitude (in m), station name and station id.

Source

European Climate Assessment and Dataset projecthttps://www.ecad.eu/

References

Klein Tank, A.M.G. and Coauthors, 2002. Daily dataset of 20th-century surface air temperature and precipitation series for theEuropean Climate Assessment. Int. J. of Climatol., 22, 1441-1453.

Examples

data(frwind, package = "mev")head(frwind)attr(frwind, which = "metadata")

Magnetic storms

Description

Absolute magnitude of 373 geomagnetic storms lasting more than 48h with absolute magnitude (dst) larger than 100 in 1957-2014.

Format

a vector of size 373

Note

For a detailed article presenting the derivation of the Dst index, seehttp://wdc.kugi.kyoto-u.ac.jp/dstdir/dst2/onDstindex.html

Source

Aki Vehtari

References

World Data Center for Geomagnetism, Kyoto, M. Nose, T. Iyemori, M. Sugiura, T. Kamei (2015),Geomagnetic Dst index, <doi:10.17593/14515-74000>.


Generalized extreme value distribution

Description

Likelihood, score function and information matrix, bias,approximate ancillary statistics and sample space derivativefor the generalized extreme value distribution

Arguments

par

vector ofloc,scale andshape

dat

sample vector

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

V

vector calculated bygev.Vfun

n

sample size

p

vector of probabilities

Usage

gev.ll(par, dat)gev.ll.optim(par, dat)gev.score(par, dat)gev.infomat(par, dat, method = c('obs','exp'))gev.retlev(par, p)gev.bias(par, n)gev.Fscore(par, dat, method=c('obs','exp'))gev.Vfun(par, dat)gev.phi(par, dat, V)gev.dphi(par, dat, V)

Functions

References

Firth, D. (1993). Bias reduction of maximum likelihood estimates,Biometrika,80(1), 27–38.

Coles, S. (2001).An Introduction to Statistical Modeling of Extreme Values, Springer, 209 p.

Cox, D. R. and E. J. Snell (1968). A general definition of residuals,Journal of the Royal Statistical Society: Series B (Methodological),30, 248–275.

Cordeiro, G. M. and R. Klein (1994). Bias correction in ARMA models,Statistics and Probability Letters,19(3), 169–176.


Firth's modified score equation for the generalized extreme value distribution

Description

Firth's modified score equation for the generalized extreme value distribution

Usage

gev.Fscore(par, dat, method = "obs")

Arguments

par

vector ofloc,scale andshape

dat

sample vector

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

References

Firth, D. (1993). Bias reduction of maximum likelihood estimates,Biometrika,80(1), 27–38.

See Also

gev


N-year return levels, median and mean estimate

Description

N-year return levels, median and mean estimate

Usage

gev.Nyr(par, nobs, N, type = c("retlev", "median", "mean"), p = 1/N)

Arguments

par

vector of location, scale and shape parameters for the GEV distribution

nobs

integer number of observation on which the fit is based

N

integer number of observations for return level. SeeDetails

type

string indicating the statistic to be calculated (can be abbreviated).

p

probability indicating the return level, corresponding to the quantile at 1-1/p

Details

If there aren_y observations per year, theL-year return level is obtained by takingN equal ton_yL.

Value

a list with components


Asymptotic bias of block maxima for fixed sample sizes

Description

Asymptotic bias of block maxima for fixed sample sizes

Usage

gev.abias(shape, rho)

Arguments

shape

shape parameter

rho

second-order parameter, non-positive

Value

a vector of length three containing the bias for location, scale and shape (in this order)

References

Dombry, C. and A. Ferreira (2017). Maximum likelihood estimators based on the block maxima method.https://arxiv.org/abs/1705.00465


Bias correction for GEV distribution

Description

Bias corrected estimates for the generalized extreme value distribution usingFirth's modified score function or implicit bias subtraction.

Usage

gev.bcor(par, dat, corr = c("subtract", "firth"), method = c("obs", "exp"))

Arguments

par

parameter vector (scale,shape)

dat

sample of observations

corr

string indicating which correction to employ eithersubtract orfirth

method

string indicating whether to use the expected ('exp') or the observed ('obs' — the default) information matrix. Used only ifcorr='firth'

Details

Methodsubtractsolves

\tilde{\boldsymbol{\theta}} = \hat{\boldsymbol{\theta}} + b(\tilde{\boldsymbol{\theta}}

for\tilde{\boldsymbol{\theta}}, using the first order term in the bias expansion as given bygev.bias.

The alternative is to use Firth's modified score and find the root of

U(\tilde{\boldsymbol{\theta}})-i(\tilde{\boldsymbol{\theta}})b(\tilde{\boldsymbol{\theta}}),

whereU is the score vector,b is the first order bias andi is either the observed or Fisher information.

The routine uses the MLE (bias-corrected) as starting values and proceedsto find the solution using a root finding algorithm.Since the bias-correction is not valid for\xi < -1/3, any solution that is unboundedwill return a vector ofNA as the solution does not exist then.

Value

vector of bias-corrected parameters

Examples

set.seed(1)dat <- mev::rgev(n=40, loc = 1, scale=1, shape=-0.2)par <- mev::fit.gev(dat)$estimategev.bcor(par, dat, 'subtract')gev.bcor(par, dat, 'firth') #observed informationgev.bcor(par, dat, 'firth','exp')

Cox-Snell first order bias for the GEV distribution

Description

Bias vector for the GEV distribution based on ann sample.Due to numerical instability, values of the information matrix and the biasare linearly interpolated when the value of the shape parameter is close to zero.

Usage

gev.bias(par, n)

Arguments

par

vector ofloc,scale andshape

n

sample size

See Also

gev


Information matrix for the generalized extreme value distribution

Description

Information matrix for the generalized extreme value distribution

Usage

gev.infomat(par, dat, method = c("obs", "exp"), nobs = length(dat))

Arguments

par

vector ofloc,scale andshape

dat

sample vector

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

nobs

number of observations

Value

a 3 by 3 matrix containing the expected or observed information


Log likelihood for the generalized extreme value distribution

Description

Function returning the density of ann sample from the GEV distribution.

Usage

gev.ll(par, dat)gev.ll.optim(par, dat)

Arguments

par

vector ofloc,scale andshape

dat

sample vector

Details

gev.ll.optim returns the negative log likelihood parametrized in terms of location,log(scale) and shape in order to perform unconstrained optimization

See Also

gev


Generalized extreme value maximum likelihood estimates for various quantities of interest

Description

This function calls thefit.gev routine on the sample of block maxima and returns maximum likelihoodestimates for all quantities of interest, including location, scale and shape parameters, quantiles and mean andquantiles of maxima ofN blocks.

Usage

gev.mle(  xdat,  args = c("loc", "scale", "shape", "quant", "Nmean", "Nquant"),  N,  p,  q)

Arguments

xdat

sample vector of maxima

args

vector of strings indicating which arguments to return the maximum likelihood values for.

N

size of block over which to take maxima. Required only forargsNmean andNquant.

p

tail probability. Required only forargquant.

q

level of quantile for maxima ofN exceedances. Required only forargsNquant.

Value

named vector with maximum likelihood estimated parameter values for argumentsargs

Examples

dat <- mev::rgev(n = 100, shape = 0.2)gev.mle(xdat = dat, N = 100, p = 0.01, q = 0.5)

Profile log-likelihood for the generalized extreme value distribution

Description

This function calculates the profile likelihood along with two small-sample correctionsbased on Severini's (1999) empirical covariance and the Fraser and Reid tangent exponentialmodel approximation.

Usage

gev.pll(  psi,  param = c("loc", "scale", "shape", "quant", "Nmean", "Nquant"),  mod = "profile",  dat,  N = NULL,  p = NULL,  q = NULL,  correction = TRUE,  plot = TRUE,  ...)

Arguments

psi

parameter vector over which to profile (unidimensional)

param

string indicating the parameter to profile over

mod

string indicating the model, one ofprofile,tem ormodif.SeeDetails.

dat

sample vector

N

size of block over which to take maxima. Required only forparamNmean andNquant.

p

tail probability. Required only forparamquant.

q

probability level of quantile. Required only forparamNquant.

correction

logical indicating whether to usespline.corr to smooth the tem approximation.

plot

logical; should the profile likelihood be displayed? Default toTRUE

...

additional arguments such as output from call toVfun ifmode='tem'.

Details

The two additionalmod available aretem, the tangent exponential model (TEM) approximation andmodif for the penalized profile likelihood based onp^* approximation proposed by Severini.For the latter, the penalization is based on the TEM or an empirical covariance adjustment term.

Value

a list with components

In addition, ifmod includestem

In addition, ifmod includesmodif

References

Fraser, D. A. S., Reid, N. and Wu, J. (1999), A simple general formula for tail probabilities for frequentist and Bayesian inference.Biometrika,86(2), 249–264.

Severini, T. (2000) Likelihood Methods in Statistics. Oxford University Press. ISBN 9780198506508.

Brazzale, A. R., Davison, A. C. and Reid, N. (2007) Applied asymptotics: case studies in small-sample statistics. Cambridge University Press, Cambridge. ISBN 978-0-521-84703-2

Examples

## Not run: set.seed(123)dat <- rgev(n = 100, loc = 0, scale = 2, shape = 0.3)gev.pll(psi = seq(0,0.5, length = 50), param = 'shape', dat = dat)gev.pll(psi = seq(-1.5, 1.5, length = 50), param = 'loc', dat = dat)gev.pll(psi = seq(10, 40, length = 50), param = 'quant', dat = dat, p = 0.01)gev.pll(psi = seq(12, 100, length = 50), param = 'Nmean', N = 100, dat = dat)gev.pll(psi = seq(12, 90, length = 50), param = 'Nquant', N = 100, dat = dat, q = 0.5)## End(Not run)

Return level for the generalized extreme value distribution

Description

This function returns the1-pth quantile of the GEV.

Usage

gev.retlev(par, p)

Arguments

par

vector ofloc,scale andshape

p

vector of probabilities

See Also

gev


Score vector for the generalized extreme value distribution

Description

Score vector for the generalized extreme value distribution

Usage

gev.score(par, dat)

Arguments

par

vector ofloc,scale andshape

dat

sample vector

Value

a vector of length containing the derivative of the


Tangent exponential model approximation for the GEV distribution

Description

The functiongev.tem provides a tangent exponential model (TEM) approximationfor higher order likelihood inference for a scalar parameter for the generalized extreme value distribution.Options include location scale and shape parameters as well as value-at-risk (or return levels).The function attempts to find good values forpsi that willcover the range of options, but the fail may fit and return an error.

Usage

gev.tem(  param = c("loc", "scale", "shape", "quant", "Nmean", "Nquant"),  dat,  psi = NULL,  p = NULL,  q = 0.5,  N = NULL,  n.psi = 50,  plot = TRUE,  correction = TRUE)

Arguments

param

parameter over which to profile

dat

sample vector for the GEV distribution

psi

scalar or ordered vector of values for the interest parameter. IfNULL (default), a grid of values centered at the MLE is selected

p

tail probability for the (1-p)th quantile (return levels). Required only ifparam = 'retlev'

q

probability level of quantile. Required only forparamNquant.

N

size of block over which to take maxima. Required only forparamNmean andNquant.

n.psi

number of values ofpsi at which the likelihood is computed, ifpsi is not supplied (NULL). Odd values are more prone to give rise to numerical instabilities near the MLE. Ifpsi is a vector of length 2 andn.psi is greater than 2, these are taken to be endpoints of the sequence.

plot

logical indicating whetherplot.fr should be called upon exit

correction

logical indicating whetherspline.corr should be called.

Value

an invisible object of classfr (seetem in packagehoa) with elements

Author(s)

Leo Belzile

Examples

## Not run: set.seed(1234)dat <- rgev(n = 40, loc = 0, scale = 2, shape = -0.1)gev.tem('shape', dat = dat, plot = TRUE)gev.tem('quant', dat = dat, p = 0.01, plot = TRUE)gev.tem('scale', psi = seq(1, 4, by = 0.1), dat = dat, plot = TRUE)dat <- rgev(n = 40, loc = 0, scale = 2, shape = 0.2)gev.tem('loc', dat = dat, plot = TRUE)gev.tem('Nmean', dat = dat, p = 0.01, N=100, plot = TRUE)gev.tem('Nquant', dat = dat, q = 0.5, N=100, plot = TRUE)## End(Not run)

Tangent exponential model statistics for the generalized extreme value distribution

Description

Matrix of approximate ancillary statistics, sample space derivative of thelog likelihood and mixed derivative for the generalized extreme value distribution.

Usage

gev.Vfun(par, dat)gev.phi(par, dat, V)gev.dphi(par, dat, V)

Arguments

par

vector ofloc,scale andshape

dat

sample vector

V

vector calculated bygev.Vfun

See Also

gev


Generalized extreme value distribution (quantile/mean of N-block maxima parametrization)

Description

Likelihood, score function and information matrix,approximate ancillary statistics and sample space derivativefor the generalized extreme value distribution parametrized in terms of thequantiles/mean of N-block maxima parametrizationz, scale and shape.

Arguments

par

vector ofloc, quantile/mean of N-block maximum andshape

dat

sample vector

V

vector calculated bygevN.Vfun

q

probability, corresponding toqth quantile of theN-block maximum

qty

string indicating whether to calculate theq quantile or the mean

Usage

gevN.ll(par, dat, N, q, qty = c('mean', 'quantile'))gevN.ll.optim(par, dat, N, q = 0.5, qty = c('mean', 'quantile'))gevN.score(par, dat, N, q = 0.5, qty = c('mean', 'quantile'))gevN.infomat(par, dat, qty = c('mean', 'quantile'), method = c('obs', 'exp'), N, q = 0.5, nobs = length(dat))gevN.Vfun(par, dat, N, q = 0.5, qty = c('mean', 'quantile'))gevN.phi(par, dat, N, q = 0.5, qty = c('mean', 'quantile'), V)gevN.dphi(par, dat, N, q = 0.5, qty = c('mean', 'quantile'), V)

Functions

Author(s)

Leo Belzile


Information matrix of the generalized extreme value distribution (quantile/mean of N-block maxima parametrization)

Description

Information matrix of the generalized extreme value distribution (quantile/mean of N-block maxima parametrization)

Usage

gevN.infomat(  par,  dat,  method = c("obs", "exp"),  qty = c("mean", "quantile"),  N,  q = 0.5,  nobs = length(dat))

Arguments

par

vector ofloc, quantile/mean of N-block maximum andshape

dat

sample vector

qty

string indicating whether to calculate theq quantile or the mean

q

probability, corresponding toqth quantile of theN-block maximum

See Also

gevN


Negative log likelihood of the generalized extreme value distribution (quantile/mean of N-block maxima parametrization)

Description

Negative log likelihood of the generalized extreme value distribution (quantile/mean of N-block maxima parametrization)

Usage

gevN.ll(par, dat, N, q = 0.5, qty = c("mean", "quantile"))

Arguments

par

vector ofloc, quantile/mean of N-block maximum andshape

dat

sample vector

q

probability, corresponding toqth quantile of theN-block maximum

qty

string indicating whether to calculate theq quantile or the mean

See Also

gevN


This function returns the mean of N observations from the GEV.

Description

This function returns the mean of N observations from the GEV.

Usage

gevN.mean(par, N)

Arguments

par

vector ofloc, quantile/mean of N-block maximum andshape

See Also

gevN


Score of the generalized extreme value distribution (quantile/mean of N-block maxima parametrization)

Description

Score of the generalized extreme value distribution (quantile/mean of N-block maxima parametrization)

Usage

gevN.score(par, dat, N, q = 0.5, qty = c("mean", "quantile"))

Arguments

par

vector ofloc, quantile/mean of N-block maximum andshape

dat

sample vector

q

probability, corresponding toqth quantile of theN-block maximum

qty

string indicating whether to calculate theq quantile or the mean

See Also

gevN


Tangent exponential model statistics for the generalized Pareto distribution (mean of maximum of N exceedances parametrization)

Description

Vector implementing conditioning on approximate ancillary statistics for the TEM

Usage

gevN.Vfun(par, dat, N, q = 0.5, qty = c("mean", "quantile"))gevN.phi(par, dat, N, q = 0.5, qty = c("mean", "quantile"), V)gevN.dphi(par, dat, N, q = 0.5, qty = c("mean", "quantile"), V)

Arguments

par

vector ofloc, quantile/mean of N-block maximum andshape

dat

sample vector

q

probability, corresponding toqth quantile of theN-block maximum

qty

string indicating whether to calculate theq quantile or the mean

V

vector calculated bygevN.Vfun

See Also

gevN


Generalized extreme value distribution

Description

Density function, distribution function, quantile function andrandom number generation for the generalized extreme valuedistribution.

Usage

qgev(p, loc = 0, scale = 1, shape = 0, lower.tail = TRUE, log.p = FALSE)rgev(n, loc = 0, scale = 1, shape = 0)dgev(x, loc = 0, scale = 1, shape = 0, log = FALSE)pgev(q, loc = 0, scale = 1, shape = 0, lower.tail = TRUE, log.p = FALSE)

Arguments

p

vector of probabilities

loc

scalar or vector of location parameters whose length matches that of the input

scale

scalar or vector of positive scale parameters whose length matches that of the input

shape

scalar shape parameter

lower.tail

logical; ifTRUE (default), returns the distribution function, otherwise the survival function

n

scalar number of observations

x,q

vector of quantiles

log,log.p

logical; ifTRUE, probabilitiesp are given as\log(p).

Details

The distribution function of a GEV distribution with parametersloc =\mu,scale =\sigma andshape =\xi is

F(x) = \exp\{-[1 + \xi (x - \mu) / \sigma] ^ {-1/\xi} \}

for1 + \xi (x - \mu) / \sigma > 0. If\xi = 0 thedistribution function is defined as the limit as\xi tends to zero.

The quantile function, when evaluated at zero or one,returns the lower and upper endpoint, whether the latter is finite or not.

Author(s)

Leo Belzile, with code adapted from Paul Northrop

References

Jenkinson, A. F. (1955) The frequency distribution of theannual maximum (or minimum) of meteorological elements.Quart. J. R. Met. Soc.,81, 158-171.Chapter 3:doi:10.1002/qj.49708134804

Coles, S. G. (2001)An Introduction to StatisticalModeling of Extreme Values, Springer-Verlag, London.doi:10.1007/978-1-4471-3675-0_3


Generalized extreme value distribution (return level parametrization)

Description

Likelihood, score function and information matrix,approximate ancillary statistics and sample space derivativefor the generalized extreme value distribution parametrized in terms of the return levelz, scale and shape.

Arguments

par

vector ofretlev,scale andshape

dat

sample vector

p

tail probability, corresponding to(1-p)th quantile forz

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

nobs

number of observations

V

vector calculated bygevr.Vfun

Usage

gevr.ll(par, dat, p)gevr.ll.optim(par, dat, p)gevr.score(par, dat, p)gevr.infomat(par, dat, p, method = c('obs', 'exp'), nobs = length(dat))gevr.Vfun(par, dat, p)gevr.phi(par, dat, p, V)gevr.dphi(par, dat, p, V)

Functions

Author(s)

Leo Belzile


Observed information matrix for GEV distribution (return levels)

Description

The information matrix is parametrized in terms of return level ((1-p)th quantile), scale and shape.

Usage

gevr.infomat(par, dat, method = c("obs", "exp"), p, nobs = length(dat))

Arguments

par

vector ofretlev,scale andshape

dat

sample vector

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

p

tail probability, corresponding to(1-p)th quantile forz

nobs

integer number of observations

See Also

gevr


Negative log likelihood of the generalized extreme value distribution (return levels)

Description

Negative log likelihood of the generalized extreme value distribution (return levels)

Negative log likelihood parametrized in terms of location, log return level and shape in order to perform unconstrained optimization

Usage

gevr.ll(par, dat, p)gevr.ll.optim(par, dat, p)

Arguments

par

vector ofretlev,scale andshape

dat

sample vector

p

tail probability, corresponding to(1-p)th quantile forz

See Also

gevr


Score of the log likelihood for the GEV distribution (return levels)

Description

Score of the log likelihood for the GEV distribution (return levels)

Usage

gevr.score(par, dat, p)

Arguments

par

vector ofretlev,scale andshape

dat

sample vector

p

tail probability, corresponding to(1-p)th quantile forz

See Also

gevr


Tangent exponential model statistics for the GEV distribution (return level)

Description

Vector implementing conditioning on approximate ancillary statistics for the TEM

Usage

gevr.Vfun(par, dat, p)gevr.phi(par, dat, p, V)gevr.dphi(par, dat, p, V)

Arguments

par

vector ofretlev,scale andshape

dat

sample vector

p

tail probability, corresponding to(1-p)th quantile forz

V

vector calculated bygevr.Vfun

See Also

gevr


Maximum likelihood estimate of generalized Pareto applied to threshold exceedances

Description

The functionfit.gpd is a wrapper aroundgp.fit

Usage

gp.fit(  xdat,  threshold,  method = c("Grimshaw", "auglag", "nlm", "optim", "ismev", "zs", "zhang"),  show = FALSE,  MCMC = NULL,  fpar = NULL,  warnSE = TRUE)

Arguments

xdat

a numeric vector of data to be fitted.

threshold

the chosen threshold.

method

the method to be used. SeeDetails. Can be abbreviated.

show

logical; ifTRUE (the default), print details of the fit.

MCMC

NULL for frequentist estimates, otherwise a boolean or a list with parameters passed. IfTRUE, runs a Metropolis-Hastings sampler to get posterior mean estimates. Can be used to pass argumentsniter,burnin andthin to the sampler as a list.

fpar

a named list with fixed parameters, eitherscale orshape

warnSE

logical; ifTRUE, a warning is printed if the standard errors cannot be returned from the observed information matrix when the shape is less than -0.5.

Value

a list; seefit.gpd for details


Generalized Pareto distribution

Description

Likelihood, score function and information matrix, bias,approximate ancillary statistics and sample space derivativefor the generalized Pareto distribution

Arguments

par

vector ofscale andshape

dat

sample vector

tol

numerical tolerance for the exponential model

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

V

vector calculated bygpd.Vfun

n

sample size

Usage

gpd.ll(par, dat, tol=1e-5)gpd.ll.optim(par, dat, tol=1e-5)gpd.score(par, dat)gpd.infomat(par, dat, method = c('obs','exp'))gpd.bias(par, n)gpd.Fscore(par, dat, method = c('obs','exp'))gpd.Vfun(par, dat)gpd.phi(par, dat, V)gpd.dphi(par, dat, V)

Functions

Author(s)

Leo Belzile

References

Firth, D. (1993). Bias reduction of maximum likelihood estimates,Biometrika,80(1), 27–38.

Coles, S. (2001).An Introduction to Statistical Modeling of Extreme Values, Springer, 209 p.

Cox, D. R. and E. J. Snell (1968). A general definition of residuals,Journal of the Royal Statistical Society: Series B (Methodological),30, 248–275.

Cordeiro, G. M. and R. Klein (1994). Bias correction in ARMA models,Statistics and Probability Letters,19(3), 169–176.

Giles, D. E., Feng, H. and R. T. Godwin (2016). Bias-corrected maximum likelihood estimation of the parameters of the generalized Pareto distribution,Communications in Statistics - Theory and Methods,45(8), 2465–2483.


Firth's modified score equation for the generalized Pareto distribution

Description

Firth's modified score equation for the generalized Pareto distribution

Usage

gpd.Fscore(par, dat, method = c("obs", "exp"))

Arguments

par

vector ofscale andshape

dat

sample vector

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

References

Firth, D. (1993). Bias reduction of maximum likelihood estimates,Biometrika,80(1), 27–38.

See Also

gpd,gpd.bcor


Asymptotic bias of threshold exceedances for k order statistics

Description

The formula given in de Haan and Ferreira, 2007 (Springer). Note that the latter differs from that found in Drees, Ferreira and de Haan.

Usage

gpd.abias(shape, rho)

Arguments

shape

shape parameter

rho

second-order parameter, non-positive

Value

a vector of length containing the bias for scale and shape (in this order)

References

Dombry, C. and A. Ferreira (2017). Maximum likelihood estimators based on the block maxima method.https://arxiv.org/abs/1705.00465


Bias correction for GP distribution

Description

Bias corrected estimates for the generalized Pareto distribution usingFirth's modified score function or implicit bias subtraction.

Usage

gpd.bcor(par, dat, corr = c("subtract", "firth"), method = c("obs", "exp"))

Arguments

par

parameter vector (scale,shape)

dat

sample of observations

corr

string indicating which correction to employ eithersubtract orfirth

method

string indicating whether to use the expected ('exp') or the observed ('obs' — the default) information matrix. Used only ifcorr='firth'

Details

Methodsubtract solves

\tilde{\boldsymbol{\theta}} = \hat{\boldsymbol{\theta}} + b(\tilde{\boldsymbol{\theta}}

for\tilde{\boldsymbol{\theta}}, using the first order term in the bias expansion as given bygpd.bias.

The alternative is to use Firth's modified score and find the root of

U(\tilde{\boldsymbol{\theta}})-i(\tilde{\boldsymbol{\theta}})b(\tilde{\boldsymbol{\theta}}),

whereU is the score vector,b is the first order bias andi is either the observed or Fisher information.

The routine uses the MLE as starting value and proceedsto find the solution using a root finding algorithm.Since the bias-correction is not valid for\xi < -1/3, any solution that is unboundedwill return a vector ofNA as the bias correction does not exist then.

Value

vector of bias-corrected parameters

Examples

set.seed(1)dat <- rgp(n=40, scale=1, shape=-0.2)par <- gp.fit(dat, threshold=0, show=FALSE)$estimategpd.bcor(par,dat, 'subtract')gpd.bcor(par,dat, 'firth') #observed informationgpd.bcor(par,dat, 'firth','exp')

Cox-Snell first order bias expression for the generalized Pareto distribution

Description

Bias vector for the GP distribution based on ann sample.

Usage

gpd.bias(par, n)

Arguments

par

vector ofscale andshape

n

sample size

References

Coles, S. (2001).An Introduction to Statistical Modeling of Extreme Values, Springer, 209 p.

Cox, D. R. and E. J. Snell (1968). A general definition of residuals,Journal of the Royal Statistical Society: Series B (Methodological),30, 248–275.

Cordeiro, G. M. and R. Klein (1994). Bias correction in ARMA models,Statistics and Probability Letters,19(3), 169–176.

Giles, D. E., Feng, H. and R. T. Godwin (2016). Bias-corrected maximum likelihood estimation of the parameters of the generalized Pareto distribution,Communications in Statistics - Theory and Methods,45(8), 2465–2483.

See Also

gpd,gpd.bcor


Bootstrap approximation for generalized Pareto parameters

Description

Given an object of classmev_gpd,returns a matrix of parameter values to mimicthe estimation uncertainty.

Usage

gpd.boot(object, B = 1000L, method = c("post", "norm"))

Arguments

object

object of classmev_gpd

B

number of pairs to sample

method

string; one of'norm' for thenormal approximation or'post' (default) for posterior sampling

Details

Two options are available: a normal approximation tothe scale and shape based on the maximum likelihoodestimates and the observed information matrix.This method uses forward sampling to simulatefrom a bivariate normal distribution that satisfiesthe support and positivity constraints

The second approximation uses the ratio-of-uniformsmethod to obtain samples from the posteriordistribution with uninformative priors, thusmimicking the joint distribution of maximum likelihood.The benefit of the latter is that it is more reliablein small samples and when the shape is negative.

Value

a matrix of size B by 2 whose columns contain scale and shape parameters


Information matrix for the generalized Pareto distribution

Description

The function returns the expected or observed information matrix.

Usage

gpd.infomat(par, dat, method = c("obs", "exp"), nobs = length(dat))

Arguments

par

vector ofscale andshape

dat

sample vector

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

nobs

number of observations

See Also

gpd


Log likelihood for the generalized Pareto distribution

Description

Function returning the density of ann sample from the GP distribution.gpd.ll.optim returns the negative log likelihood parametrized in terms oflog(scale) and shapein order to perform unconstrained optimization. The function is coded in such a way that the log likelihood is infinite when\xi < -1.

Usage

gpd.ll(par, dat, tol = 1e-05)gpd.ll.optim(par, dat, tol = 1e-05)

Arguments

par

vector ofscale andshape

dat

sample vector

tol

numerical tolerance for the exponential model

See Also

gpd


Estimation of generalized Pareto parameters via L-moments

Description

Given a sample of exceedances, compute the first four L-moments and use either the first two to obtain the scale and shape (default), or else use L-skewness and L-scale to compute the scale and shape of the generalized Pareto distribution

Usage

gpd.lmom(xdat, thresh, sorted = FALSE, Lskew = FALSE)

Arguments

xdat

[numeric] vector of observations

thresh

[numeric] optional threshold argument

sorted

[logical] ifTRUE, observations are sorted in increasing order

Lskew

[logical]; ifTRUE, shape is obtained from L-skewness rather than first two moments.

Value

a vector of length two with the scale and shape estimates


Generalized Pareto maximum likelihood estimates for various quantities of interest

Description

This function calls thefit.gpd routine on the sample of excesses and returns maximum likelihoodestimates for all quantities of interest, including scale and shape parameters, quantiles and value-at-risk,expected shortfall and mean and quantiles of maxima ofN threshold exceedances

Usage

gpd.mle(  xdat,  args = c("scale", "shape", "quant", "VaR", "ES", "Nmean", "Nquant"),  m,  N,  p,  q)

Arguments

xdat

sample vector of excesses

args

vector of strings indicating which arguments to return the maximum likelihood values for

m

number of observations of interest for return levels. Required only forargs values'VaR' or'ES'

N

size of block over which to take maxima. Required only forargsNmean andNquant.

p

tail probability, equivalent to1/m. Required only forargsquant.

q

level of quantile for N-block maxima. Required only forargsNquant.

Value

named vector with maximum likelihood values for argumentsargs

Examples

xdat <- mev::rgp(n = 30, shape = 0.2)gpd.mle(xdat = xdat, N = 100, p = 0.01, q = 0.5, m = 100)

Profile log-likelihood for the generalized Pareto distribution

Description

This function calculates the (modified) profile likelihood based on thep^* formula.There are two small-sample corrections that use a proxy for\ell_{\lambda; \hat{\lambda}},which are based on Severini's (1999) empirical covarianceand the Fraser and Reid tangent exponential model approximation.

Usage

gpd.pll(  psi,  param = c("scale", "shape", "quant", "retlev", "VaR", "ES", "Nmean", "Nquant"),  mod = "profile",  mle = NULL,  dat,  m = NULL,  N = NULL,  p = NULL,  q = NULL,  correction = TRUE,  thresh = NULL,  plot = TRUE,  ...)

Arguments

psi

parameter vector over which to profile (unidimensional)

param

string indicating the parameter to profile over

mod

string indicating the model. SeeDetails.

mle

maximum likelihood estimate in(\psi, \xi) parametrization if\psi \neq \xi and(\sigma, \xi) otherwise (optional).

dat

sample vector of excesses, unlessthresh is provided (in which case user provides original data)

m

number of observations of interest for return levels. Required only forargs values'VaR' or'ES'

N

size of block over which to take maxima. Required only forargsNmean andNquant.

p

tail probability, equivalent to1/m. Required only forargsquant.

q

level of quantile for N-block maxima. Required only forargsNquant.

correction

logical indicating whether to usespline.corr to smooth the tem approximation.

thresh

numerical threshold above which to fit the generalized Pareto distribution

plot

logical; should the profile likelihood be displayed? Default toTRUE

...

additional arguments such as output from call toVfun ifmode='tem'.

Details

The threemod available areprofile (the default),tem, the tangent exponential model (TEM) approximation andmodif for the penalized profile likelihood based onp^* approximation proposed by Severini.For the latter, the penalization is based on the TEM or an empirical covariance adjustment term.

Value

a list with components

In addition, ifmod includestem

In addition, ifmod includesmodif

Examples

## Not run: dat <- rgp(n = 100, scale = 2, shape = 0.3)gpd.pll(psi = seq(-0.5, 1, by=0.01), param = 'shape', dat = dat)gpd.pll(psi = seq(0.1, 5, by=0.1), param = 'scale', dat = dat)gpd.pll(psi = seq(20, 35, by=0.1), param = 'quant', dat = dat, p = 0.01)gpd.pll(psi = seq(20, 80, by=0.1), param = 'ES', dat = dat, m = 100)gpd.pll(psi = seq(15, 100, by=1), param = 'Nmean', N = 100, dat = dat)gpd.pll(psi = seq(15, 90, by=1), param = 'Nquant', N = 100, dat = dat, q = 0.5)## End(Not run)

Score vector for the generalized Pareto distribution

Description

Score vector for the generalized Pareto distribution

Usage

gpd.score(par, dat)

Arguments

par

vector ofscale andshape

dat

sample vector

See Also

gpd


Tangent exponential model approximation for the GP distribution

Description

The functiongpd.tem provides a tangent exponential model (TEM) approximationfor higher order likelihood inference for a scalar parameter for the generalized Pareto distribution. Options includescale and shape parameters as well as value-at-risk (also referred to as quantiles, or return levels)and expected shortfall. The function attempts to find good values forpsi that willcover the range of options, but the fit may fail and return an error. In such cases, the user can try to find goodgrid of starting values and provide them to the routine.

Usage

gpd.tem(  dat,  param = c("scale", "shape", "quant", "VaR", "retlev", "ES", "Nmean", "Nquant"),  psi = NULL,  m = NULL,  thresh = 0,  n.psi = 50,  N = NULL,  p = NULL,  q = NULL,  plot = FALSE,  correction = TRUE,  ...)

Arguments

dat

sample vector for the GP distribution

param

parameter over which to profile

psi

scalar or ordered vector of values for the interest parameter. IfNULL (default), a grid of values centered at the MLE is selected. Ifpsi is of length 2 andn.psi>2, it is assumed to be the minimal and maximal values at which to evaluate the profile log likelihood.

m

number of observations of interest for return levels. SeeDetails. Required only forparam = 'VaR' orparam = 'ES'.

thresh

threshold value corresponding to the lower bound of the support or the location parameter of the generalized Pareto distribution.

n.psi

number of values ofpsi at which the likelihood is computed, ifpsi is not supplied (NULL). Odd values are more prone to give rise to numerical instabilities near the MLE

N

size of block over which to take maxima. Required only forargsNmean andNquant.

p

tail probability, equivalent to1/m. Required only forargsquant.

q

level of quantile for N-block maxima. Required only forargsNquant.

plot

logical indicating whetherplot.fr should be called upon exit

correction

logical indicating whetherspline.corr should be called.

...

additional arguments, for backward compatibility

Details

As of version 1.11, this function is a wrapper aroundgpd.pll.

The interpretation form is as follows: if there are on averagem_y observations per year above the threshold, thenm = Tm_y corresponds toT-year return level.

Value

an invisible object of classfr (seetem in packagehoa) with elements

Author(s)

Leo Belzile

Examples

set.seed(123)dat <- rgp(n = 40, scale = 1, shape = -0.1)#with plotsm1 <- gpd.tem(param = 'shape', n.psi = 50, dat = dat, plot = TRUE)## Not run: m2 <- gpd.tem(param = 'scale', n.psi = 50, dat = dat)m3 <- gpd.tem(param = 'VaR', n.psi = 50, dat = dat, m = 100)#Providing psipsi <- c(seq(2, 5, length = 15), seq(5, 35, length = 45))m4 <- gpd.tem(param = 'ES', dat = dat, m = 100, psi = psi, correction = FALSE)mev:::plot.fr(m4, which = c(2, 4))plot(fr4 <- spline.corr(m4))confint(m1)confint(m4, parm = 2, warn = FALSE)m5 <- gpd.tem(param = 'Nmean', dat = dat, N = 100, psi = psi, correction = FALSE)m6 <- gpd.tem(param = 'Nquant', dat = dat, N = 100, q = 0.7, correction = FALSE)## End(Not run)

Tangent exponential model statistics for the generalized Pareto distribution

Description

Matrix of approximate ancillary statistics, sample space derivative of thelog likelihood and mixed derivative for the generalized Pareto distribution.

Usage

gpd.Vfun(par, dat)gpd.phi(par, dat, V)gpd.dphi(par, dat, V)

Arguments

par

vector ofscale andshape

dat

sample vector

V

vector calculated bygpd.Vfun

See Also

gpd


Generalized Pareto distribution (mean of maximum of N exceedances parametrization)

Description

Likelihood, score function and information matrix,approximate ancillary statistics and sample space derivativefor the generalized Pareto distribution parametrized in terms of average maximum ofN exceedances.

The parameterN corresponds to the number of threshold exceedances of interest over which the maxima is taken.z is the corresponding expected value of this block maxima.Note that the actual parametrization is in terms of excess expected mean, meaning expected mean minus threshold.

Arguments

par

vector of length 2 containingz and\xi, respectively the mean excess of the maxima of N exceedances above the threshold and the shape parameter.

dat

sample vector

N

block size for threshold exceedances.

tol

numerical tolerance for the exponential model

V

vector calculated bygpdN.Vfun

Details

The observed information matrix was calculated from the Hessian using symbolic calculus in Sage.

Usage

gpdN.ll(par, dat, N, tol=1e-5)gpdN.score(par, dat, N)gpdN.infomat(par, dat, N, method = c('obs', 'exp'), nobs = length(dat))gpdN.Vfun(par, dat, N)gpdN.phi(par, dat, N, V)gpdN.dphi(par, dat, N, V)

Functions

Author(s)

Leo Belzile


Information matrix of the generalized Pareto distribution (mean of maximum of N exceedances parametrization)

Description

Information matrix of the generalized Pareto distribution (mean of maximum of N exceedances parametrization)

Usage

gpdN.infomat(par, dat, N, method = c("obs", "exp"), nobs = length(dat))

Arguments

par

vector of length 2 containingz and\xi, respectively the mean excess of the maxima of N exceedances above the threshold and the shape parameter.

dat

sample vector

N

block size for threshold exceedances.

See Also

gpdN


Negative log likelihood of the generalized Pareto distribution (mean of maximum of N exceedances parametrization)

Description

Negative log likelihood of the generalized Pareto distribution (mean of maximum of N exceedances parametrization)

Usage

gpdN.ll(par, dat, N)

Arguments

par

vector of length 2 containingz and\xi, respectively the mean excess of the maxima of N exceedances above the threshold and the shape parameter.

dat

sample vector

N

block size for threshold exceedances.

See Also

gpdN


This function returns the mean of N maxima from the GP.

Description

This function returns the mean of N maxima from the GP.

Usage

gpdN.mean(par, N)

Arguments

par

vector of length 2 containingz and\xi, respectively the mean excess of the maxima of N exceedances above the threshold and the shape parameter.

N

block size for threshold exceedances.

See Also

gpdN


This function returns the qth percentile of N maxima from the GP.

Description

This function returns the qth percentile of N maxima from the GP.

Usage

gpdN.quant(par, q, N)

Arguments

par

vector of length 2 containingz and\xi, respectively the mean excess of the maxima of N exceedances above the threshold and the shape parameter.

N

block size for threshold exceedances.

See Also

gpdN


Score vector of the generalized Pareto distribution (mean of maximum of N exceedances parametrization)

Description

Score vector of the generalized Pareto distribution (mean of maximum of N exceedances parametrization)

Usage

gpdN.score(par, dat, N)

Arguments

par

vector of length 2 containingz and\xi, respectively the mean excess of the maxima of N exceedances above the threshold and the shape parameter.

dat

sample vector

N

block size for threshold exceedances.

See Also

gpdN


Tangent exponential model statistics for the generalized Pareto distribution (mean of maximum of N exceedances parametrization)

Description

Vector implementing conditioning on approximate ancillary statistics for the TEM

Usage

gpdN.Vfun(par, dat, N)gpdN.phi(par, dat, N, V)gpdN.dphi(par, dat, N, V)

Arguments

par

vector of length 2 containingz and\xi, respectively the mean excess of the maxima of N exceedances above the threshold and the shape parameter.

dat

sample vector

N

block size for threshold exceedances.

V

vector calculated bygpdN.Vfun

See Also

gpdN


Generalized Pareto distribution (expected shortfall parametrization)

Description

Likelihood, score function and information matrix,approximate ancillary statistics and sample space derivativefor the generalized Pareto distribution parametrized in terms of expected shortfall.

The parameterm corresponds to\zeta_u/(1-\alpha), where\zeta_u is the rate of exceedance over the thresholdu and\alpha is the percentile of the expected shortfall.Note that the actual parametrization is in terms of excess expected shortfall, meaning expected shortfall minus threshold.

Arguments

par

vector of length 2 containinge_m and\xi, respectively the expected shortfall at probability 1/(1-\alpha) and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

tol

numerical tolerance for the exponential model

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

nobs

number of observations

V

vector calculated bygpde.Vfun

Details

The observed information matrix was calculated from the Hessian using symbolic calculus in Sage.

Usage

gpde.ll(par, dat, m, tol=1e-5)gpde.ll.optim(par, dat, m, tol=1e-5)gpde.score(par, dat, m)gpde.infomat(par, dat, m, method = c('obs', 'exp'), nobs = length(dat))gpde.Vfun(par, dat, m)gpde.phi(par, dat, V, m)gpde.dphi(par, dat, V, m)

Functions

Author(s)

Leo Belzile


Observed information matrix for the GP distribution (expected shortfall)

Description

The information matrix is parametrized in terms of excess expected shortfall and shape

Usage

gpde.infomat(par, dat, m, method = c("obs", "exp"), nobs = length(dat))

Arguments

par

vector of length 2 containinge_m and\xi, respectively the expected shortfall at probability 1/(1-\alpha) and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

nobs

number of observations

See Also

gpde


Negative log likelihood of the generalized Pareto distribution (expected shortfall)

Description

Negative log likelihood of the generalized Pareto distribution (expected shortfall)

Negative log likelihood of the generalized Pareto distribution (expected shortfall) - optimizationThe negative log likelihood is parametrized in terms of log expected shortfall and shape in order to perform unconstrained optimization

Usage

gpde.ll(par, dat, m)gpde.ll.optim(par, dat, m)

Arguments

par

vector of length 2 containinge_m and\xi, respectively the expected shortfall at probability 1/(1-\alpha) and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

See Also

gpde


Score vector for the GP distribution (expected shortfall)

Description

Score vector for the GP distribution (expected shortfall)

Usage

gpde.score(par, dat, m)

Arguments

par

vector of length 2 containinge_m and\xi, respectively the expected shortfall at probability 1/(1-\alpha) and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

See Also

gpde


Tangent exponential model statistics for the generalized Pareto distribution (expected shortfall)

Description

Vector implementing conditioning on approximate ancillary statistics for the TEM

Usage

gpde.Vfun(par, dat, m)gpde.phi(par, dat, V, m)gpde.dphi(par, dat, V, m)

Arguments

par

vector of length 2 containinge_m and\xi, respectively the expected shortfall at probability 1/(1-\alpha) and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

V

vector calculated bygpde.Vfun

See Also

gpde


Generalized Pareto distribution

Description

Density function, distribution function, quantile function andrandom number generation for the generalized Paretodistribution.

Usage

pgp(q, loc = 0, scale = 1, shape = 0, lower.tail = TRUE, log.p = FALSE)dgp(x, loc = 0, scale = 1, shape = 0, log = FALSE)qgp(p, loc = 0, scale = 1, shape = 0, lower.tail = TRUE)rgp(n, loc = 0, scale = 1, shape = 0)

Arguments

loc

location parameter.

scale

scale parameter, strictly positive.

shape

shape parameter.

lower.tail

logical; ifTRUE (default), the lower tail probability\Pr(X \leq x) is returned.

log.p,log

logical; ifFALSE (default), values are returned on the probability scale.

x,q

vector of quantiles

p

vector of probabilities

n

scalar number of observations

References

Coles, S. G. (2001)An Introduction to StatisticalModeling of Extreme Values, Springer-Verlag, London.doi:10.1007/978-1-4471-3675-0_3


Generalized Pareto distribution (return level parametrization)

Description

Likelihood, score function and information matrix,approximate ancillary statistics and sample space derivativefor the generalized Pareto distribution parametrized in terms of return levels.

Arguments

par

vector of length 2 containingy_m and\xi, respectively them-year return level and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

tol

numerical tolerance for the exponential model

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

nobs

number of observations

V

vector calculated bygpdr.Vfun

Details

The observed information matrix was calculated from the Hessian using symbolic calculus in Sage.

The interpretation form is as follows: if there are on averagem_y observations per year above the threshold, thenm=Tm_y corresponds toT-year return level.

Usage

gpdr.ll(par, dat, m, tol=1e-5)gpdr.ll.optim(par, dat, m, tol=1e-5)gpdr.score(par, dat, m)gpdr.infomat(par, dat, m, method = c('obs', 'exp'), nobs = length(dat))gpdr.Vfun(par, dat, m)gpdr.phi(par, V, dat, m)gpdr.dphi(par, V, dat, m)

Functions

Author(s)

Leo Belzile


Observed information matrix for GP distribution (return levels)

Description

The information matrix is parametrized in terms of rate ofm-year return level and shape

Usage

gpdr.infomat(par, dat, m, method = c("obs", "exp"), nobs = length(dat))

Arguments

par

vector of length 2 containingy_m and\xi, respectively them-year return level and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

nobs

number of observations

See Also

gpdr


Negative log likelihood of the generalized Pareto distribution (return levels)

Description

Negative log likelihood of the generalized Pareto distribution (return levels)

Negative log likelihood parametrized in terms of log return level and shape in order to perform unconstrained optimization

Usage

gpdr.ll(par, dat, m)gpdr.ll.optim(par, dat, m)

Arguments

par

vector of length 2 containingy_m and\xi, respectively them-year return level and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

See Also

gpdr


Score of the profile log likelihood for the GP distribution (return levels parametrization)

Description

Score of the profile log likelihood for the GP distribution (return levels parametrization)

Usage

gpdr.score(par, dat, m)

Arguments

par

vector of length 2 containingy_m and\xi, respectively them-year return level and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

See Also

gpdr


Tangent exponential model statistics for the generalized Pareto distribution (return level)

Description

Vector implementing conditioning on approximate ancillary statistics for the TEM

Usage

gpdr.Vfun(par, dat, m)gpdr.phi(par, dat, V, m)gpdr.dphi(par, dat, V, m)

Arguments

par

vector of length 2 containingy_m and\xi, respectively them-year return level and the shape parameter.

dat

sample vector

m

number of observations of interest for return levels. SeeDetails

V

vector calculated bygpdr.Vfun

See Also

gpdr

gpdr


Transformation from the generalized Pareto to unit Pareto

Description

Transformation from the generalized Pareto to unit Pareto

Usage

gpdtopar(dat, loc = 0, scale, shape, lambdau = 1)

Arguments

dat

vector or matrix of data

loc

vector of location parameters

scale

vector of scale parameters, strictly positive

shape

shape parameter

lambdau

vector of probability of marginal threshold exceedance

Value

a vector or matrix of the same dimension asdat with unit Pareto observations


Interpret bivariate threshold exceedance models

Description

This is an adaptation of theevir packageinterpret.gpdbiv function.interpret.fbvpot deals with the output of a call tofbvpot from theevd and to handle families other than the logistic distribution.The likelihood derivation comes from expression 2.10 in Smith et al. (1997).

Usage

ibvpot(fitted, q, silent = FALSE)

Arguments

fitted

the output offbvpot or a list. See Details.

q

a vector of quantiles to consider, on the data scale. Must be greater than the thresholds.

silent

boolean; whether to print the interpretation of the result. Default toFALSE.

Details

The listfitted must contain

Value

an invisible numeric vector containing marginal, joint and conditional exceedance probabilities.

Author(s)

Leo Belzile, adapting original S code by Alexander McNeil

References

Smith, Tawn and Coles (1997), Markov chain models for threshold exceedances.Biometrika,84(2), 249–268.

See Also

interpret.gpdbiv in packageevir

Examples

if (requireNamespace("evd", quietly = TRUE)) {y <- rgp(1000,1,1,1)x <- y*rmevspec(n=1000,d=2,sigma=cbind(c(0,0.5),c(0.5,0)), model='hr')mod <- evd::fbvpot(x, threshold = c(1,1), model = 'hr', likelihood ='censored')ibvpot(mod, c(20,20))}

Information matrix test statistic and MLE for the extremal index

Description

The information matrix test (IMT), proposed by Suveges and Davison (2010), is basedon the difference between the expected quadratic score and the second derivative ofthe log-likelihood. The asymptotic distribution for each thresholdu and gapKis asymptotically\chi^2 with one degree of freedom. The approximation is good forN>80 and conservative for smaller sample sizes. The test assumes independence between gaps.

Usage

infomat.test(xdat, thresh, q, K, plot = TRUE, ...)

Arguments

xdat

data vector

thresh

threshold vector

q

vector of probability levels to define threshold ifthresh is missing.

K

int specifying the largest K-gap

plot

logical: should the graphical diagnostic be plotted?

...

additional arguments, currently ignored

Details

The procedure proposed in Suveges & Davison (2010) was corrected for erratas.The maximum likelihood is based on the limiting mixture distribution ofthe intervals between exceedances (an exponential with a point mass at zero).The conditionD^{(K)}(u_n) should be checked by the user.

Fukutome et al. (2015) propose an ad hoc automated procedure

  1. Calculate the interexceedance times for each K-gap and each threshold, along with the number of clusters

  2. Select the (u,K) pairs for which IMT < 0.05 (corresponding to a P-value of 0.82)

  3. Among those, select the pair (u,K) for which the number of clusters is the largest

Value

an invisible list of matrices containing

Author(s)

Leo Belzile

References

Fukutome, Liniger and Suveges (2015), Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland.Theoretical and Applied Climatology,120(3), 403-416.

Suveges and Davison (2010), Model misspecification in peaks over threshold analysis.Annals of Applied Statistics,4(1), 203-221.

White (1982), Maximum Likelihood Estimation of Misspecified Models.Econometrica,50(1), 1-25.

Examples

infomat.test(xdat = rgp(n = 10000),             q = seq(0.1, 0.9, length = 10),             K = 3)

Intensity function for the Brown-Resnick model

Description

The intensity function includes the normalizing constants

Usage

intensBR(tdat, Lambda, cholPrecis = NULL)

Arguments

tdat

matrix of unit Pareto observations

Lambda

conditionally negative definite parameter matrix of the Huesler–Reiss model

cholPrecis

Cholesky root of the corresponding precision matrixsolve(Sigma). Default toNULL, meaning the latter is calculated within the function

Value

log intensity contribution


Intensity function for the extremal Student model

Description

The intensity function includes the normalizing constants

Usage

intensXstud(tdat, df, Sigma, cholPrecis = NULL)

Arguments

tdat

matrix of unit Pareto observations

df

degrees of freedom, must be larger than 1

Sigma

scale matrix

cholPrecis

Cholesky root of the precision matrixsolve(Sigma). Default toNULL, meaning the latter is calculated within the function

Value

log intensity contribution


Jacobian of the transformation from generalized Pareto to unit Pareto distribution

Description

Ifdat is a vector, the argumentsloc,scale andshape should be numericals of length 1.

Usage

jac_gpd_pareto(dat, loc = 0, scale, shape, lambdau = 1, censored)

Arguments

dat

vector or matrix of data

loc

vector of location parameters

scale

vector of scale parameters, strictly positive

shape

shape parameter

lambdau

vector of probability of marginal threshold exceedance

censored

a matrix of logical indicating whether the observations are censored

Value

log-likelihood contribution for the Jacobian


Estimators of the tail coefficient

Description

Estimators proposed by Krupskii and Joe under second order expansionfor the coefficient of tail dependence\eta and thejoint tail orthant probability

Usage

kjtail(  xdat,  qlev,  ptail = NULL,  mqu = NULL,  type = 1,  ties.method = eval(formals(rank)$ties.method),  ...)

Arguments

xdat

a matrix of observations

qlev

vector of quantile levels

ptail

tail probability smaller thanqlev. Default toNULL

mqu

marginal quantile levels for semiparametric estimation; data above this are modelled using a generalized Pareto distribution. IfNULL, empirical estimation is used throughout

type

integer indicating the estimator type

ties.method

method for handling of ties in rank transformation

...

additional arguments, for backward compatibility

Value

a list with elements

Note

EXPERIMENTAL. The numerical optimization of the likelihood surface isdifficult, as the function is ill-behaved. Visual inspection of estimates isoften necessary to check for possible lack of convergence.

Examples

d <- 2rho <- 0.9Sigma <- matrix(rho, d, d) + diag(1 - rho, d)eta_true <- 1/sum(Sigma)data <- rmnorm(   n = 1e4,   mu = rep(0, d),  Sigma = Sigma)q <- seq(0.95, 0.995, by = 0.005)kj <- kjtail(xdat = data, qlev = q)plot(kj)abline(h = (1+rho)/2, col = 2)

Estimation of the bivariate angular dependence function of Wadsworth and Tawn (2013)

Description

Estimation of the bivariate angular dependence function of Wadsworth and Tawn (2013)

Usage

lambdadep(  dat,  qu = 0.95,  method = c("hill", "mle", "bayes"),  plot = TRUE,  level = 0.95,  ...)

Arguments

dat

ann by2 matrix of multivariate observations

qu

quantile level on uniform scale at which to threshold data. Default to 0.95

method

string indicating the estimation method

plot

logical indicating whether to return the graph oflambda

level

level for confidence intervals, default to 0.95

...

additional arguments, used for backward compatibility

The confidence intervals are based on normal quantiles. The standard errors for thehillare based on the asymptotic covariance and that of themle derived using the delta-method.Bayesian posterior predictive interval estimates are obtained using ratio-of-uniform sampling with flat priors:the shape parameters are constrained to lie within the triangle, as are frequentist point estimateswhich are adjusted post-inference.

Value

a plot of the lambda function ifplot=TRUE, plus an invisible list with components

Examples

## Not run: set.seed(12)dat <- rmev(n = 1000, d = 2, model = "log", param = 0.1)lambdadep(dat, method = 'hill')lambdadep(dat, method = 'bayes')lambdadep(dat, method = 'mle')# With independent observationsdat <- matrix(runif(n = 2000), ncol = 2)lambdadep(dat, method = 'hill')## End(Not run)

Leeds air pollution

Description

Daily maximum data (hourly for PM10) on air pollution for the Leeds Centre station in Yorkshire and Humberside station. The data goes from January 1st, 1993, until December 31st, 2024. Data show seasonality and there are some outliers. From December 2nd, 2008 onwards, particulate matters (PM10 and PM2.5) are measured using a tapered element oscillating microbalance (TEOM) and Filter Dynamics Measurement System (FDMS). The data for PM2.5 is missing before the change of instrumentation. A total of 231 daily measurements with only missing values were removed during preprocessing.

Usage

leedspollution

Format

A data frame with 11455 rows and 8 variables:

date

[character] a date with format yyy-mm-dd

O3

[integer] ozone (in nanograms per cubic meter)

NO

[integer] nitrogen oxyde (in nanograms per cubic meter)

CO

[double] carbon monoxyde (in micrograms per cubic meter)

NO2

nitrogen dioxyde (in nanograms per cubic meter)

SO2

sulphur dioxide (in nanograms per cubic meter)

PM10

[integer] particulate matter 10, (in nanograms per cubic meter)

PM2.5

[integer] particulate matter 2.5, (in nanograms per cubic meter)

Source

Crown 2025 copyright Defra viauk-air.defra.gov.uk, licenced under the Open Government Licence (OGL).


Likelihood for multivariate peaks over threshold models

Description

Likelihood for the various parametric limiting models over region determined by

\{y \in F: \max_{j=1}^D \sigma_j \frac{y^\xi_j-1}{\xi_j}+\mu_j > u\};

where\mu isloc,\sigma isscale and\xi isshape.

Usage

likmgp(  dat,  thresh,  loc,  scale,  shape,  par,  model = c("log", "br", "xstud"),  likt = c("mgp", "pois", "binom"),  lambdau = 1,  ...)

Arguments

dat

matrix of observations

thresh

functional threshold for the maximum

loc

vector of location parameter for the marginal generalized Pareto distribution

scale

vector of scale parameter for the marginal generalized Pareto distribution

shape

vector of shape parameter for the marginal generalized Pareto distribution

par

list of parameters:alpha for the logistic model,Lambda for the Brown–Resnick model or elseSigma anddf for the extremal Student.

model

string indicating the model family, one of"log","neglog","br" or"xstud"

likt

string indicating the type of likelihood, with an additional contribution for the non-exceeding components: one of"mgp","binom" and"pois".

lambdau

vector of marginal rate of marginal threshold exceedance.

...

additional arguments (see Details)

Details

Optional arguments can be passed to the function via...

Value

the value of the log-likelihood withattributesexpme, giving the exponent measure

Note

The location and scale parameters are not identifiable unless one of them is fixed.


Maiquetia Daily Rainfall

Description

Daily cumulated rainfall (in mm) at Maiquetia airport, Venezuela.The observations cover the period from January 1961 to December 1999.The original series had missing days in February 1996 (during which there were2 days with 1hr each of light rain) and January 1998 (no rain). These were replaced by zeros.

Format

a vector of size 14244 containing daily rainfall (in mm),

Source

J.R. Cordova and M. González, accessed 25.11.2018 from <https://rss.onlinelibrary.wiley.com/hub/journal/14679876/series-c-datasets>

References

Coles, S. and L.R. Pericchi (2003). Anticipating Catastrophes through Extreme Value Modelling,Applied Statistics,52(4), 405-416.

Coles, S., Pericchi L.R. and S. Sisson (2003). A fully probabilistic approach to extreme rainfall modeling,Journal of Hydrology,273, 35-50.

Examples

## Not run: data(maiquetia, package = "mev")day <- seq.Date(from = as.Date("1961-01-01"), to = as.Date("1999-12-31"), by = "day")nzrain <- maiquetia[substr(day, 1, 4) < 1999 & maiquetia > 0]fit.gpd(nzrain, threshold = 30, show = TRUE)## End(Not run)

Transform arguments using max stability

Description

Given a vector of location, scale and shape parameters,compute the corresponding parameters for block of sizem assuming a generalized extreme value distribution.

Usage

maxstable(pars, m = 1L, inverse = FALSE)

Arguments

pars

vector of location, scale and shape parameters

m

[integer] block size

inverse

[logical] whether to compute the parameters for the inverse relationship (defaults toFALSE)

Examples

maxstable(pars = maxstable(pars = c(1,2,0), m = 10), m = 10, inv = TRUE)maxstable(pars = maxstable(pars = c(1,2,0.1), m = 5), m = 1/5)

P-P plot for testing max stability

Description

The diagnostic, proposed by Gabda, Towe, Wadsworth and Tawn,relies on the fact that, for max-stable vectors on the unit Gumbel scale,the distribution of the maxima is Gumbel distribution with a location parameter equal to the exponent measure.One can thus consider tuples of sizem and estimate the location parameter via maximum likelihoodand transforming observations to the standard Gumbel scale. Replicates are then pooled and empirical quantiles are defined.The number of combinations ofm vectors can be prohibitively large, hence onlynmax randomly selectedtuples are selected from all possible combinations. The confidence intervals are obtained by anonparametric bootstrap, by resampling observations with replacement observations for the selected tuples and re-estimating thelocation parameter. The procedure can be computationally intensive as a result.

Usage

maxstabtest(  dat,  m = prod(dim(dat)[-1]),  nmax = 500L,  B = 1000L,  ties.method = "random",  plot = TRUE)

Arguments

dat

matrix or array of max-stable observations, typically block maxima. The first dimension should consist of replicates

m

integer indicating how many tuples should be aggregated.

nmax

maximum number of pairs. Default to 500L.

B

number of nonparametric bootstrap replications. Default to 1000L.

ties.method

string indicating the method forrank. Default to"random".

plot

logical indicating whether a graph should be produced (default toTRUE).

Value

a Tukey probability-probability plot with 95

References

Gabda, D.; Towe, R. Wadsworth, J. and J. Tawn, Discussion of “Statistical Modeling of Spatial Extremes” by A. C. Davison, S. A. Padoan and M. Ribatet.Statist. Sci.27 (2012), no. 2, 189–192.


Multivariate Normal distribution sampler

Description

Sampler derived using the eigendecomposition of the covariancematrixSigma. The function uses the Armadillo random normal generator

Usage

mvrnorm(n, mu, Sigma)

Arguments

n

sample size

mu

mean vector. Will set the dimension

Sigma

a square covariance matrix, of same dimension asmu.No sanity check is performed to validate that the matrix is positive definite, so use at own risk

Value

ann sample from a multivariate Normal distribution

Examples

mvrnorm(n = 10, mu = c(0,2), Sigma = diag(2))

River Nidd Flow

Description

The data consists of exceedances over the threshold 65 cubic meter per second of the River Nidd at Hunsingore Weir, for 35 years of data between 1934 and 1969.

Format

a vector of size 154

Source

Natural Environment Research Council (1975).Flood Studies Report, volume 4. pp. 235–236.

References

Davison, A.C. and R.L. Smith (1990). Models for Exceedances over High Thresholds (with discussion),Journal of the Royal Statistical Society. Series B (Methodological),52(3), 393–442.

See Also

nidd.thresh from theevir package


Nutrient data

Description

Interview component of survey 'What we eat inAmerica'. These are extracted from the 2015–2016 National Health and Nutrition Examination Survey (NHANES,https://wwwn.cdc.gov/nchs/nhanes/Default.aspx) report and consist of the total nutrients for all food and beverage intake ingested over a 24 hours period.

Usage

nutrients

Format

A data frame with 9544 rows and 38 variables:

prot

proteins (in grams)

carb

carbonhydrate (in gram)

sugr

total sugars (in gram)

fibe

dietary fibers (in grams)

tfat

total fat (in grams)

sfat

saturated fat (in grams)

mfat

monounsaturated fat (in grams)

pfat

polyunsaturated fat (in grams)

chol

cholesterol (in milligrams)

atoc

vitamin E as alpha-tocopherol (in milligrams)

ret

retinol (in micrograms)

vara

Vitamin A as retinol activity equivalents (in micrograms).

acar

alpha-carotene (in micrograms)

bcar

beta-carotene (in micrograms)

cryp

beta-cryptoxanthin (in micrograms)

lyco

lycopene (in micrograms)

lz

lutein and zeaxanthin (in micrograms).

vb1

thiamin (vitamin B1, in milligrams)

vb2

riboflavin (vitamin B2, in milligrams)

niac

niacin (in milligrams)

vb6

vitamin B5 (in milligrams)

fola

total folate (in micrograms)

fa

folic acid (in micrograms)

ff

food folate (in micrograms)

chl

total choline (in milligrams)

vb12

vitamin B12 (in micrograms)

vc

vitamin C (in milligrams)

vd

vitamin D (comprising D2 and D3, in micrograms)

vk

vitamin K (in micrograms)

calc

calcium (in milligrams)

phos

phosphorus (in milligrams)

magn

magnesium (in milligrams)

iron

iron (in milligrams)

zinc

zinc (in milligrams)

copp

copper (in milligrams)

sodi

sodium (in milligrams)

pota

potassium (in milligrams)

sele

selenium (in micrograms)

Details

Note that the sample design oversampled specific population targets and that only respondants are provided. The website contains more information about sampling weights. There are multiple missing records.

Note

These data are subject to a data user agreement, available athttps://www.cdc.gov/nchs/policy/data-user-agreement.html

Source

National Center for Health Statistics, now available from the Wayback Machine viahttps://web.archive.org/web/20201029113801/https://wwwn.cdc.gov/Nchs/Nhanes/2015-2016/DR1TOT_I.XPT


Deaths from pandemics

Description

The data base contains estimated records of the number of deaths from pandemics.

Usage

pandemics

Format

A data frame with 72 rows and 8 variables:

event

name of the event

startyear

start year of the event

endyear

end year of the event

lower

lower bound on estimated deaths (in thousands)

average

average estimated deaths (in thousands)

upper

upper bound on estimated deaths (in thousands)

saverage

scaled average of estimated deaths (in thousands)

population

estimated population at risk (in thousands)

Source

Cirillo, P. and N.N. Taleb (2020).Tail risk of contagious diseases. Nat. Phys.16, 606–613 (2020). <doi:10.1038/s41567-020-0921-x>


Smith's penultimate approximations

Description

The function takes as arguments the distribution and density functions. There are two options:method='bm' yields block maxima andmethod='pot' threshold exceedances.Formethod='bm', the user should provide in such case the block sizes via theargumentm, whereas ifmethod='pot', a vector of threshold values should beprovided. The other argument (thresh orm depending on the method) is ignored.

Usage

penultimate(family, method = c("bm", "pot"), thresh, qlev, m, ...)

Arguments

family

the name of the parametric family. Will be used to obtaindfamily,pfamily,qfamily

method

either block maxima ('bm') or peaks-over-threshold ('pot') are supported

thresh

vector of thresholds for method'pot'

qlev

vector of quantile levels for method'pot', e.g., 0.9, 0.95, ... Ignored if argumentthresh is provided.

m

vector of block sizes for method'bm'

...

additional arguments passed todensF anddistF

Details

Alternatively, the user can provide functionsdensF,quantF anddistF for the density,quantile function and distribution functions, respectively. The user can also supply the derivativeof the density function,ddensF. If the latter is missing, it will be approximated using finite-differences.

Formethod = "pot", the function computes the reciprocal hazard and its derivative on the log scale to avoid numerical overflow. Thus, the density function should have argumentlog and the distribution function argumentslog.p andlower.tail, respectively.

Value

a data frame containing

Author(s)

Leo Belzile

References

Smith, R.L. (1987). Approximations in extreme value theory.Technical report 205, Center for Stochastic Process, University of North Carolina, 1–34.

Examples

# Threshold exceedance for Normal variablesquants <- seq(1, 5, by = 0.02)penult <- penultimate(   family = "norm",   method = 'pot',   thresh = quants,   ddensF = function(x){-x*dnorm(x)}, # optional argument   )plot(x = quants,     y = penult$shape,     type = 'l',     xlab = 'quantile',    ylab = 'Penultimate shape',    ylim = c(-0.5, 0))# Block maxima for Gamma variables# User must provide arguments for shape (or rate), for which there is no defaultm <- seq(30, 3650, by = 30)penult <- penultimate(family = 'gamma', method = 'bm', m = m, shape = 0.1)plot(x = m,     y = penult$shape,     type = 'l',     xlab = 'quantile',     ylab = 'penultimate shape')# Comparing density of GEV approximation with true density of maximam <- 100 # block of size 100p <- penultimate(  family = 'norm',  ddensF = function(x){-x*dnorm(x)},  method = 'bm',  m = m)x <- seq(1, 5, by = 0.01)plot(  x = x,  y = m * dnorm(x) * exp((m-1) * pnorm(x, log.p = TRUE)),  type = 'l',  ylab = 'density',  main = 'Distribution of the maxima of\n 100 standard normal variates')lines(x, mev::dgev(x, loc = p$loc, scale = p$scale, shape = 0), col = 2)lines(x, mev::dgev(x, loc = p$loc, scale = p$scale, shape = p$shape), col = 4)legend( x = 'topright', lty = c(1, 1, 1), col = c(1, 2, 4), legend = c('exact', 'ultimate', 'penultimate'), bty = 'n')

Extended GP functions

Description

These functions are documented inextgp and inextgp.G for the carrier distributions supported in the unit interval.

Usage

pextgp.G(u, type = 1, prob, kappa, delta)dextgp.G(u, type = 1, prob = NA, kappa = NA, delta = NA, log = FALSE)qextgp.G(u, type = 1, prob = NA, kappa = NA, delta = NA)rextgp.G(  n,  prob = NA,  kappa = NA,  delta = NA,  type = 1,  unifsamp = NULL,  direct = FALSE,  censoring = c(0, 1))pextgp(q, prob = NA, kappa = NA, delta = NA, sigma = NA, xi = NA, type = 1)dextgp(  x,  prob = NA,  kappa = NA,  delta = NA,  sigma = NA,  xi = NA,  type = 1,  log = FALSE)qextgp(  p,  prob = NA,  kappa = NA,  delta = NA,  sigma = NA,  xi = NA,  type = 1,  step = 0)rextgp(  n,  prob = NA,  kappa = NA,  delta = NA,  sigma = NA,  xi = NA,  type = 1,  unifsamp = NULL,  censoring = c(0, Inf))

See Also

extgp,extgp.G


Plot of (modified) profile likelihood

Description

The function plots the (modified) profile likelihood and the tangent exponential profile likelihood

Usage

## S3 method for class 'eprof'plot(x, ...)

Arguments

x

an object of classeprof returned bygpd.pll orgev.pll.

...

further arguments toplot.

Value

a graph of the (modified) profile likelihoods

References

Brazzale, A. R., Davison, A. C. and Reid, N. (2007).Applied Asymptotics: Case Studies in Small-Sample Statistics. Cambridge University Press, Cambridge.

Severini, T. A. (2000).Likelihood Methods in Statistics. Oxford University Press, Oxford.


Plot of tangent exponential model profile likelihood

Description

This function is adapted from theplot.fr function from thehoa package bundle.It differs from the latter mostly in the placement of legends.

Usage

## S3 method for class 'fr'plot(x, ...)

Arguments

x

an object of classfr returned bygpd.tem orgev.tem.

...

further arguments toplot currently ignored. Providing a numeric vectorwhich allows for custom selection of the plots. A logicalall. SeeDetails.

Details

Plots produced depend on the integers provided inwhich.1 displays the Wald pivot, the likelihood rootr, the modified likelihood rootrstar and the likelihood modificationq as functions of the parameterpsi.2 gives the renormalized profile log likelihood and adjusted form, with the maximum likelihood having ordinate value of zero.3 provides the significance function, a transformation of1. Lastly,4 plots the correction factor as a function of the likelihood root; it is a diagnostic plot aimed for detecting failure ofthe asymptotic approximation, often due to poor numerics in a neighborhood ofr=0; the function should be smooth. The functionspline.corr is designed to handle this by correcting numerically unstable estimates, replacing outliers and missing values with the fitted values from the fit.

Value

graphs depending on argumentwhich

References

Brazzale, A. R., Davison, A. C. and Reid, N. (2007).Applied Asymptotics: Case Studies in Small-Sample Statistics. Cambridge University Press, Cambridge.


Plots for random block maximum estimator

Description

The function returns plot of the shape estimator along with the value (and 95% Wald-based confidence interval) at the selected threshold, or a plot of the empirical Bayes risk.

Usage

## S3 method for class 'mev_shape_rbm'plot(x, type = c("shape", "risk"), log = TRUE, ...)

Arguments

x

object of classmev_shape_rbm returned byshape.rbm

type

[string] type of plot, either"shape" for the tail index or"risk" for the empirical Bayes risk

log

[logical] ifTRUE (default), the x-axis for the number of exceedances is displayed on the log scale.

...

additional arguments, currently ignored

Value

one or more plots


Plots for Varty and al. metric-based threshold selection

Description

This S3 method produces quantile-quantile plots with confidence and tolerance bands on various scale (uniform, exponential, generalized Pareto), or a plot of the metric as a function of the threshold.

Usage

## S3 method for class 'mev_thselect_vmetric'plot(  x,  type = c("qq", "pp", "exp", "metric"),  B = 1000L,  probs = c(0.025, 0.975),  ...)

Arguments

x

an object of classmev_thselect_vmetric produced by a call tothselect.vmetric

type

string; a single string indicating the choice of plot

B

number of simulations for variability of estimation

probs

quantile levels for intervals.

...

additional arguments, currently ignored

Value

NULL; the function is used to produce a plot


Sequential analysis diagnostic plots for threshold selection

Description

Sequential analysis diagnostic plots for threshold selection

Usage

## S3 method for class 'mev_thselect_wadsworth'plot(x, type = c("wn", "ps"), ...)

Arguments

x

object returned by a call tothselect.wseq

type

string giving the plots to produce

...

additional arguments passed to plotting function

Value

NULL; the method is used to generate plots


Mean residual life parameter stability plot

Description

Mean residual life parameter stability plot

Usage

## S3 method for class 'mev_tstab_mrl'plot(  x,  xlab = c("thresh", "nexc"),  level = 0.95,  type = c("band", "ptwise"),  ...)

Arguments

x

object resulting from a call totstab.mrl

xlab

[string]; whether to plot mean residual life plot as a function of threshold value of number of exceedances

level

[numeric] level of Wald confidence intervals

type

[string] whether to plot pointwise confidence intervals using segments ("ptwise") or using dashed lines ("band")

...

additional arguments, currently ignored

Value

NULL; use to produce plots


Power variogram model

Description

The power variogram model is

\gamma(h) = (\|h\|/\lambda)^\alpha, \quad \lambda>0, \alpha \in [0,2).

Usage

power.vario(h, alpha, scale = 1)

Arguments

h

vector or matrix of pairwise distances

alpha

smoothness parameter

scale

scale parameter

Value

a vector or matrix of variogram values of the same length ash


Power exponential correlation model

Description

The power correlation model is

\rho(h) = \exp\{-(\|h\|/\lambda)^\alpha\}, \quad \lambda>0, \alpha \in [0,2).

Usage

powerexp.cor(h, alpha = 1, scale = 1)

Arguments

h

vector or matrix of pairwise distances

alpha

smoothness parameter

scale

scale parameter

Value

a vector or matrix of correlations of the same dimensions ash


Poisson process of extremes.

Description

Likelihood, score function and information matrix for the Poisson process likelihood.

Arguments

par

vector ofloc,scale andshape

dat

sample vector

u

threshold

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

np

number of periods of observations. This is apost hoc adjustment for the intensity so that the parameters of the model coincide with those of a generalized extreme value distribution with block sizelength(dat)/np.

nobs

number of observations for the expected information matrix. Default tolength(dat) ifdat is provided.

Usage

pp.ll(par, dat)pp.ll(par, dat, u, np)pp.score(par, dat)pp.infomat(par, dat, method = c('obs', 'exp'))

Functions

Author(s)

Leo Belzile

References

Coles, S. (2001).An Introduction to Statistical Modeling of Extreme Values, Springer, 209 p.

Wadsworth, J.L. (2016). Exploiting Structure of Maximum Likelihood Estimators for Extreme Value Threshold Selection,Technometrics,58(1), 116-126,http://dx.doi.org/10.1080/00401706.2014.998345.

Sharkey, P. and J.A. Tawn (2017). A Poisson process reparameterisation for Bayesian inference for extremes,Extremes,20(2), 239-263,http://dx.doi.org/10.1007/s10687-016-0280-2.


Information matrix for the Poisson process likelihood

Description

The function returns the expected or observed information matrix.

Usage

pp.infomat(par, dat, method = c("obs", "exp"), u, np = 1, nobs = length(dat))

Arguments

par

vector ofloc,scale andshape

dat

sample vector

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

u

threshold

np

number of periods of observations. This is apost hoc adjustment for the intensity so that the parameters of the model coincide with those of a generalized extreme value distribution with block sizelength(dat)/np.

nobs

number of observations for the expected information matrix. Default tolength(dat) ifdat is provided.

Value

information matrix of the NHPP

Note

For the expected information matrix, the number of points above the threshold is random, but should correspond tonp\Lambda. The parametrization fornp is shared betweenfit.pp,pp.ll, etc.The entries for the information matrix are given in Sharkey and Tawn (2017), but contains some typos which were corrected.

References

Sharkey, P. and J.A. Tawn (2017). A Poisson process reparameterisation for Bayesian inference for extremes,Extremes,20(2), 239-263,http://dx.doi.org/10.1007/s10687-016-0280-2.

See Also

pp

Examples

## Not run: dat <- rgp(n <- 1e3, 0.1, 2, -0.1)np <- 10mle <- fit.pp(dat, threshold = 0, np =  np)$parinfo_obs <- pp.infomat(par = mle, dat = dat, method = "obs", u = 0, np = np)info_exp <- pp.infomat(par = mle, dat = dat, method = "exp", u = 0, np = np)info_obs/info_exp## End(Not run)

Log-likelihood of Poisson process of threshold exceedances

Description

This function returns the log-likelihood of the non-homogeneous Poisson processof exceedances above thresholdu, adjusted so that there arenp periodsof observations.

Usage

pp.ll(par, dat, u, np = 1)

Arguments

par

vector ofloc,scale andshape

dat

sample vector

u

threshold

np

number of periods of observations. This is apost hoc adjustment for the intensity so that the parameters of the model coincide with those of a generalized extreme value distribution with block sizelength(dat)/np.

Value

log-likelihood of the NHPP

See Also

pp


Score vector for the Poisson process of threshold exceedances

Description

Returns the score vector of the NHPP.

Usage

pp.score(par, dat, u, np = 1)

Arguments

par

vector ofloc,scale andshape

dat

sample vector

u

threshold

np

number of periods of observations. This is apost hoc adjustment for the intensity so that the parameters of the model coincide with those of a generalized extreme value distribution with block sizelength(dat)/np.

Value

score vector of NHPP

See Also

pp


Weissman's quantile estimator

Description

Given a small probability of exceedancep,the number of exceedancesk out ofn observationabove the thresholdu (thresh) (corresponding typically to the (k+1)th order statistic, compute the tail quantile at levelQ(1-p) using the estimator of Weissman (1978) under the assumption of Pareto tail (positive shape\xi), viz.

Q(1-p) = u \left(\frac{k}{pn}\right)^{\xi}.

Usage

qweissman(p, k, n, thresh, shape)

Arguments

p

tail probability, must be larger than the proportion of exceedancesk/n.

k

vector of the number of exceedances abovethresh

n

integer, total sample size

thresh

vector of thresholds

shape

vector of positive shape parameters

Value

a vector of tail quantiles

References

Weissman, I. (1978). Estimation of Parameters and Larger Quantiles Based on thek Largest Observations.Journal of the American Statistical Association, 73(364), 812–815. <doi:10.2307/2286285>.

Examples

set.seed(2025)p <- 1/100xdat <- rgp(n = 1000, loc = 2, scale = 2, shape = 0.4)hill <- shape.hill(xdat, k = seq(20L, 100L, by = 10L))thresh <- sort(xdat, decreasing = TRUE)[hill$k+1]qweissman(   p = 1/100,   k = hill$k,   n = length(xdat),   thresh = thresh,   shape = hill$shape)# Compare with true quantileqgp(1/100, loc = 2, scale = 2, shape = 0.4, lower.tail = FALSE)

Random variate generation for Dirichlet distribution onS_{d}

Description

A function to sample Dirichlet random variables, based on the representation as ratios of Gamma.Note that the RNG will generate on the full simplex and the sum to one constraint is respectedhere

Usage

rdir(n, alpha, normalize = TRUE)

Arguments

n

sample size

alpha

vector of parameter

normalize

boolean. IfFALSE, the function returns Gamma variates with parameteralpha.

Value

sample of dimensiond (size of alpha) from the Dirichlet distribution.

Examples

rdir(n=100, alpha=c(0.5,0.5,2),TRUE)rdir(n=100, alpha=c(3,1,2),FALSE)

Simulation from generalized R-Pareto processes

Description

The generalized R-Pareto process is supported on(loc - scale / shape, Inf) ifshape > 0,or(-Inf, loc - scale / shape) for negative shape parameters, conditional on(X-r(loc))/r(scale)>0.The standard Pareto process corresponds toscale = loc = rep(1, d).

Usage

rgparp(  n,  shape = 1,  thresh = 1,  risk = c("mean", "sum", "site", "max", "min", "l2"),  siteindex = NULL,  d,  loc,  scale,  param,  sigma,  model = c("log", "neglog", "bilog", "negbilog", "hr", "br", "xstud", "smith",    "schlather", "ct", "sdir", "dirmix"),  weights,  vario,  coord = NULL,  ...)

Arguments

n

number of observations

shape

shape parameter of the generalized Pareto variable

thresh

univariate threshold for the exceedances of risk functional

risk

string indicating the risk functional.

siteindex

integer between 1 and d specifying the index of the site or variable

d

dimension of sample

loc

location vector

scale

scale vector

param

parameter vector for the logistic, bilogistic, negative bilogistic and extremal Dirichlet (Coles and Tawn) model.Parameter matrix for the Dirichlet mixture. Degree of freedoms for extremal student model. SeeDetails.

sigma

covariance matrix for Brown-Resnick and extremal Student-t distributions. Symmetric matrix of squared coefficients\lambda^2 for the Husler-Reiss model, with zero diagonal elements.

model

for multivariate extreme value distributions, users can choose between 1-parameter logistic and negative logistic, asymmetric logistic and negative logistic, bilogistic, Husler-Reiss, extremal Dirichlet model (Coles and Tawn) or the Dirichlet mixture. Spatial models includethe Brown-Resnick, Smith, Schlather and extremal Student max-stable processes. Max linear models are also supported

weights

vector of lengthm for them mixture components that sum to one. For the"maxlin" model, weights should be a matrix withd columns that represent the weight of the components and whose column sum to one (if provided, this argument overridesasy).

vario

semivariogram function whose first argument must be distance. Used only if provided in conjunction withcoord and ifsigma is missing

coord

d byk matrix of coordinates, used as input in the variogramvario or as parameter for the Smith model. Ifgrid isTRUE, unique entries should be supplied.

...

additional arguments for thevario function

Value

ann byd sample from the generalized R-Pareto process, withattributesaccept.rate if the procedure uses rejection sampling.

Examples

rgparp(n = 10, risk = 'site', siteindex = 2, d = 3, param = 2.5,   model = 'log', scale = c(1, 2, 3), loc = c(2, 3, 4), shape = 0.5)rgparp(n = 10, risk = 'max', d = 4, param = c(0.2, 0.1, 0.9, 0.5),   scale = 1:4, loc = 1:4, model = 'bilog')rgparp(n = 10, risk = 'sum', d = 3, param = c(0.8, 1.2, 0.6, -0.5),   scale = 1:3, loc = 1:3, model = 'sdir')vario <- function(x, scale = 0.5, alpha = 0.8){ scale*x^alpha }grid.coord <- as.matrix(expand.grid(runif(4), runif(4)))rgparp(n = 10, risk = 'max', vario = vario, coord = grid.coord,   model = 'br', scale = runif(16), loc = rnorm(16))

Second order tail index estimator of Drees and Kaufmann

Description

Estimator of the second order regular variation parameter\rho \leq 0 parameter for heavy-tailed data proposed by Drees and Kaufmann (1998)

Usage

rho.dk(xdat, k, tau = 0.5)

Arguments

xdat

vector of positive observations

k

number of highest order statistics to use for estimation

tau

tuning parameter\tau \in (0,1)

References

Drees, H. and E. Kaufmann (1998). Selecting the optimal sample fraction in univariate extreme value estimation,Stochastic Processes and their Applications, 75(2), 149-172, <doi:10.1016/S0304-4149(98)00017-9>.


Second order tail index estimator of Fraga Alves et al.Estimator of the second order regular variation parameter\rho \leq 0 parameter for heavy-tailed data proposed by Fraga Alves et al. (2003)

Description

Second order tail index estimator of Fraga Alves et al.Estimator of the second order regular variation parameter\rho \leq 0 parameter for heavy-tailed data proposed by Fraga Alves et al. (2003)

Usage

rho.fagh(xdat, k, tau = 0)

Arguments

xdat

vector of positive observations

k

number of highest order statistics to use for estimation

tau

scalar real tuning parameter. Default values is 0, which is typically chosen whenever\rho \ge -1. The choice\tau=1 otherwise.

References

Fraga Alves, M.I., Gomes, M. Ivette, and de Haan, Laurens (2003). A new class of semi-parametric estimators of the second order parameter.Portugaliae Mathematica. Nova Serie 60(2), 193-213. <http://eudml.org/doc/50867>.

Examples

# Example with rho = -0.2n <- 1000xdat <- mev::rgp(n = n, shape = 0.2)kmin <- floor(n^0.995)kmax <- ceiling(n^0.999)rho_est <- rho.fagh(   xdat = xdat,   k = n - kmin:kmax)rho_med <- mean(rho_est$rho)

Second order tail index estimator of Goegebeur et al. (2008)

Description

Estimator of the second order regular variation parameter\rho \leq 0 parameter for heavy-tailed data based on ratio of kernel goodness-of-fit statistics.

Usage

rho.gbw(xdat, k)

Arguments

xdat

vector of positive observations

k

number of highest order statistics to use for estimation

References

Goegebeur, Y., J. Beirlant and T. de Wet (2008). Linking Pareto-tail kernel goodness-of-fit statistics with tail index at optimal threshold and second order estimation.REVSTAT-Statistical Journal, 6(1), 51-69. <doi:10.57805/revstat.v6i1.57>


Second order tail index estimator of Gomes et al.

Description

Estimator of the second order regular variation parameter\rho \leq 0 parameter for heavy-tailed data proposed by Gomes et al. (2003)

Usage

rho.ghp(xdat, k, alpha = 2)

Arguments

xdat

vector of positive observations

k

number of highest order statistics to use for estimation

alpha

positive scalar tuning parameter

References

Gomes, M.I., de Haan, L. & Peng, L. (2002). Semi-parametric Estimation of the Second Order Parameter in Statistics of Extremes.Extremes 5, 387–414. <doi:10.1023/A:1025128326588>


Distribution of the r-largest observations

Description

Likelihood, score function and information matrix for the r-largest observations likelihood.

Arguments

par

vector ofloc,scale andshape

dat

ann byr sample matrix, ordered from largest to smallest in each row

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

nobs

number of observations for the expected information matrix. Default tonrow(dat) ifdat is provided.

r

number of order statistics kept. Default toncol(dat)

Usage

rlarg.ll(par, dat, u, np)rlarg.score(par, dat)rlarg.infomat(par, dat, method = c('obs', 'exp'), nobs = nrow(dat), r = ncol(dat))

Functions

Author(s)

Leo Belzile

References

Coles, S. (2001).An Introduction to Statistical Modeling of Extreme Values, Springer, 209 p.

Smith, R.L. (1986). Extreme value theory based on the r largest annual events,Journal of Hydrology,86(1-2), 27–43,http://dx.doi.org/10.1016/0022-1694(86)90004-1.


Information matrix for the r-largest observations.

Description

The function returns the expected or observed information matrix.

Usage

rlarg.infomat(  par,  dat,  method = c("obs", "exp"),  nobs = nrow(dat),  r = ncol(dat))

Arguments

par

vector ofloc,scale andshape

dat

ann byr sample matrix, ordered from largest to smallest in each row

method

string indicating whether to use the expected ('exp') or the observed ('obs' - the default) information matrix.

nobs

number of observations for the expected information matrix. Default tonrow(dat) ifdat is provided.

r

number of order statistics kept. Default toncol(dat)

See Also

rlarg


Log-likelihood of the point process of r-largest observations

Description

The function returns the log-likelihood for ann byr matrix of observations.

Usage

rlarg.ll(par, dat)

Arguments

par

vector ofloc,scale andshape

dat

ann byr sample matrix, ordered from largest to smallest in each row

See Also

rlarg


Score of the r-largest observations

Description

The score is computed via linear interpolation for the shape parameter in a neighborhood of zero

Usage

rlarg.score(par, dat)

Arguments

par

vector ofloc,scale andshape

dat

ann byr sample matrix, ordered from largest to smallest in each row

Value

score vector of size 3

See Also

rlarg


Exact simulations of multivariate extreme value distributions

Description

Implementation of the random number generators for multivariate extreme-value distributionsand max-stable processes based on the two algorithms described in Dombry, Engelke and Oesting (2016).

Usage

rmev(  n,  d,  param,  asy,  sigma,  model = c("log", "alog", "neglog", "aneglog", "bilog", "negbilog", "hr", "br", "xstud",    "smith", "schlather", "ct", "sdir", "dirmix", "pairbeta", "pairexp", "wdirbs",    "wexpbs", "maxlin"),  alg = c("ef", "sm"),  weights = NULL,  vario = NULL,  coord = NULL,  grid = FALSE,  dist = NULL,  ...)

Arguments

n

number of observations

d

dimension of sample

param

parameter vector for the logistic, bilogistic, negative bilogistic and extremal Dirichlet (Coles and Tawn) model.Parameter matrix for the Dirichlet mixture. Degree of freedoms for extremal student model. SeeDetails.

asy

list of asymmetry parameters, as in functionrmvevd from packageevd, of2^d-1 vectors of size corresponding to the power set ofd, with sum to one constraints.

sigma

covariance matrix for Brown-Resnick and extremal Student-t distributions. Symmetric matrix of squared coefficients\lambda^2 for the Husler-Reiss model, with zero diagonal elements.

model

for multivariate extreme value distributions, users can choose between 1-parameter logistic and negative logistic, asymmetric logistic and negative logistic, bilogistic, Husler-Reiss, extremal Dirichlet model (Coles and Tawn) or the Dirichlet mixture. Spatial models includethe Brown-Resnick, Smith, Schlather and extremal Student max-stable processes. Max linear models are also supported

alg

algorithm, either simulation via extremal function ('ef') or via the spectral measure ('sm'). Default toef.

weights

vector of lengthm for them mixture components that sum to one. For the"maxlin" model, weights should be a matrix withd columns that represent the weight of the components and whose column sum to one (if provided, this argument overridesasy).

vario

semivariogram function whose first argument must be distance. Used only if provided in conjunction withcoord and ifsigma is missing

coord

d byk matrix of coordinates, used as input in the variogramvario or as parameter for the Smith model. Ifgrid isTRUE, unique entries should be supplied.

grid

Logical.TRUE if the coordinates are two-dimensional grid points (spatial models).

dist

symmetric matrix of pairwise distances. Default toNULL.

...

additional arguments for thevario function

Details

The vector param differs depending on the model

Stephenson points out that the multivariate asymmetric negative logistic model given in e.g. Coles and Tawn (1991) is not a valid distribution function in dimensiond>3 unless additional constraints are imposed on the parameter values.The implementation inmev uses the same construction as the asymmetric logistic distribution (see the vignette). As such it does not match the bivariate implementation ofrbvevd.

The dependence parameter of theevd package for the Husler-Reiss distribution can be recovered takingfor the Brown–Resnick model2/r=\sqrt(2\gamma(h)) whereh is the lag vector between sites andr=1/\lambda for the Husler–Reiss.

Value

ann byd exact sample from the corresponding multivariate extreme value model

Warning

As of version 1.8 (August 16, 2016), there is a distinction between modelshr andbr. The latter is meant to be used in conjunction with variograms. The parametrization differs between the two models.

The family of scaled Dirichlet is now parametrized by a parameter in-\min(\alpha) appended to the thed vectorparam containing the parameteralphaof the Dirichlet model. Argumentsmodel='dir' andmodel='negdir' are still supported internally, but not listed in the options.

Author(s)

Leo Belzile

References

Dombry, Engelke and Oesting (2016). Exact simulation of max-stable processes,Biometrika,103(2), 303–317.

See Also

rmevspec,rmvevd,rbvevd

Examples

set.seed(1)rmev(n=100, d=3, param=2.5, model='log', alg='ef')rmev(n=100, d=4, param=c(0.2,0.1,0.9,0.5), model='bilog', alg='sm')## Spatial example using power variogram#NEW: Semi-variogram must take distance as argumentsemivario <- function(x, scale, alpha){ scale*x^alpha }#grid specificationgrid.coord <- as.matrix(expand.grid(runif(4), runif(4)))rmev(n=100, vario=semivario, coord=grid.coord, model='br', scale = 0.5, alpha = 1)#using the Brown-Resnick model with a covariance matrixvario2cov <- function(coord, semivario,...){ sapply(1:nrow(coord), function(i) sapply(1:nrow(coord), function(j)  semivario(sqrt(sum((coord[i,])^2)), ...) +  semivario(sqrt(sum((coord[j,])^2)), ...) -  semivario(sqrt(sum((coord[i,]-coord[j,])^2)), ...)))}rmev(n=100, sigma=vario2cov(grid.coord, semivario = semivario, scale = 0.5, alpha = 1), model='br')# asymmetric logistic model - see function 'rmvevd' from package 'evd 'asy <- list(0, 0, 0, 0, c(0,0), c(0,0), c(0,0), c(0,0), c(0,0), c(0,0),  c(.2,.1,.2), c(.1,.1,.2), c(.3,.4,.1), c(.2,.2,.2), c(.4,.6,.2,.5))rmev(n=1, d=4, param=0.3, asy=asy, model="alog")#Example with a grid (generating an array)rmev(n=10, sigma=cbind(c(2,1), c(1,3)), coord=cbind(runif(4), runif(4)), model='smith', grid=TRUE)## Example with Dirichlet mixturealpha.mat <- cbind(c(2,1,1),c(1,2,1),c(1,1,2))rmev(n=100, param=alpha.mat, weights=rep(1/3,3), model='dirmix')rmev(n=10, param=c(0.1,1,2,3), d=3, model='pairbeta')

Random samples from spectral distributions of multivariate extreme value models.

Description

Generate fromQ_i, the spectral measure of a given multivariate extreme value model based on the L1 norm.

Usage

rmevspec(  n,  d,  param,  sigma,  model = c("log", "neglog", "bilog", "negbilog", "hr", "br", "xstud", "smith",    "schlather", "ct", "sdir", "dirmix", "pairbeta", "pairexp", "wdirbs", "wexpbs",    "maxlin"),  weights = NULL,  vario = NULL,  coord = NULL,  grid = FALSE,  dist = NULL,  ...)

Arguments

n

number of observations

d

dimension of sample

param

parameter vector for the logistic, bilogistic, negative bilogistic and extremal Dirichlet (Coles and Tawn) model.Parameter matrix for the Dirichlet mixture. Degree of freedoms for extremal student model. SeeDetails.

sigma

covariance matrix for Brown-Resnick and extremal Student-t distributions. Symmetric matrix of squared coefficients\lambda^2 for the Husler-Reiss model, with zero diagonal elements.

model

for multivariate extreme value distributions, users can choose between 1-parameter logistic and negative logistic, asymmetric logistic and negative logistic, bilogistic, Husler-Reiss, extremal Dirichlet model (Coles and Tawn) or the Dirichlet mixture. Spatial models includethe Brown-Resnick, Smith, Schlather and extremal Student max-stable processes. Max linear models are also supported

weights

vector of lengthm for them mixture components that sum to one. For the"maxlin" model, weights should be a matrix withd columns that represent the weight of the components and whose column sum to one (if provided, this argument overridesasy).

vario

semivariogram function whose first argument must be distance. Used only if provided in conjunction withcoord and ifsigma is missing

coord

d byk matrix of coordinates, used as input in the variogramvario or as parameter for the Smith model. Ifgrid isTRUE, unique entries should be supplied.

grid

Logical.TRUE if the coordinates are two-dimensional grid points (spatial models).

dist

symmetric matrix of pairwise distances. Default toNULL.

...

additional arguments for thevario function

Details

The vector param differs depending on the model

Value

ann byd exact sample from the corresponding multivariate extreme value model

Note

This functionality can be useful to generate for example Pareto processes with marginal exceedances.

Author(s)

Leo Belzile

References

Dombry, Engelke and Oesting (2016). Exact simulation of max-stable processes,Biometrika,103(2), 303–317.

Boldi (2009). A note on the representation of parametric models for multivariate extremes.Extremes12, 211–218.

Examples

set.seed(1)rmevspec(n=100, d=3, param=2.5, model='log')rmevspec(n=100, d=3, param=2.5, model='neglog')rmevspec(n=100, d=4, param=c(0.2,0.1,0.9,0.5), model='bilog')rmevspec(n=100, d=2, param=c(0.8,1.2), model='ct') #Dirichlet modelrmevspec(n=100, d=2, param=c(0.8,1.2,0.5), model='sdir') #with additional scale parameter#Variogram gamma(h) = scale*||h||^alpha#NEW: Variogram must take distance as argumentvario <- function(x, scale=0.5, alpha=0.8){ scale*x^alpha }#grid specificationgrid.coord <- as.matrix(expand.grid(runif(4), runif(4)))rmevspec(n=100, vario=vario,coord=grid.coord, model='br')## Example with Dirichlet mixturealpha.mat <- cbind(c(2,1,1),c(1,2,1),c(1,1,2))rmevspec(n=100, param=alpha.mat, weights=rep(1/3,3), model='dirmix')

Multivariate Normal distribution sampler

Description

Sampler derived using the eigendecomposition of the covariancematrixSigma. The function uses the Armadillo random normal generator

Usage

rmnorm(n, mu, Sigma)

Arguments

n

sample size

mu

mean vector. Will set the dimension

Sigma

a square covariance matrix, of same dimension asmu.No sanity check is performed to validate that the matrix is positive definite, so use at own risk

Value

ann sample from a multivariate Normal distribution

Examples

rmnorm(n = 10, mu = c(0,2), Sigma = diag(2))

Simulation from R-Pareto processes

Description

Simulation from R-Pareto processes

Usage

rparp(  n,  shape = 1,  risk = c("sum", "site", "max", "min", "l2"),  siteindex = NULL,  d,  param,  sigma,  model = c("log", "neglog", "bilog", "negbilog", "hr", "br", "xstud", "smith",    "schlather", "ct", "sdir", "dirmix"),  weights,  vario,  coord = NULL,  ...)

Arguments

n

number of observations

shape

shape tail index of Pareto variable

risk

string indicating risk functional.

siteindex

integer between 1 and d specifying the index of the site or variable

d

dimension of sample

param

parameter vector for the logistic, bilogistic, negative bilogistic and extremal Dirichlet (Coles and Tawn) model.Parameter matrix for the Dirichlet mixture. Degree of freedoms for extremal student model. SeeDetails.

sigma

covariance matrix for Brown-Resnick and extremal Student-t distributions. Symmetric matrix of squared coefficients\lambda^2 for the Husler-Reiss model, with zero diagonal elements.

model

for multivariate extreme value distributions, users can choose between 1-parameter logistic and negative logistic, asymmetric logistic and negative logistic, bilogistic, Husler-Reiss, extremal Dirichlet model (Coles and Tawn) or the Dirichlet mixture. Spatial models includethe Brown-Resnick, Smith, Schlather and extremal Student max-stable processes. Max linear models are also supported

weights

vector of lengthm for them mixture components that sum to one. For the"maxlin" model, weights should be a matrix withd columns that represent the weight of the components and whose column sum to one (if provided, this argument overridesasy).

vario

semivariogram function whose first argument must be distance. Used only if provided in conjunction withcoord and ifsigma is missing

coord

d byk matrix of coordinates, used as input in the variogramvario or as parameter for the Smith model. Ifgrid isTRUE, unique entries should be supplied.

...

additional arguments for thevario function

Details

Forriskf=max andriskf=min, the procedure uses rejection sampling based on Pareto variatessampled fromsum and may be slow ifd is large.

Value

ann byd sample from the R-Pareto process, withattributesaccept.rate if the procedure uses rejection sampling.

Examples

rparp(n=10, risk = 'site', siteindex=2, d=3, param=2.5, model='log')rparp(n=10, risk = 'min', d=3, param=2.5, model='neglog')rparp(n=10, risk = 'max', d=4, param=c(0.2,0.1,0.9,0.5), model='bilog')rparp(n=10, risk = 'sum', d=3, param=c(0.8,1.2,0.6, -0.5), model='sdir')vario <- function(x, scale=0.5, alpha=0.8){ scale*x^alpha }grid.coord <- as.matrix(expand.grid(runif(4), runif(4)))rparp(n=10, risk = 'max', vario=vario, coord=grid.coord, model='br')

Simulation from Pareto processes using composition sampling

Description

The algorithm performs forward sampling by simulating first from amixture, then sample angles conditional on them being less than (max) or greater than (min) one.The resulting sample from the angular distribution is then multiplied byPareto variates with tail indexshape.

Usage

rparpcs(  n,  model = c("log", "neglog", "br", "xstud"),  risk = c("max", "min"),  param = NULL,  d,  Lambda = NULL,  Sigma = NULL,  df = NULL,  shape = 1,  ...)

Arguments

n

sample size.

model

string indicating the model family.

risk

string indicating the risk functional. Onlymax andmin are currently supported.

param

parameter value for the logistic or negative logistic model

d

dimension of the multivariate model, only needed for logistic or negative logistic models

Lambda

parameter matrix for the Brown–Resnick model. SeeDetails.

Sigma

correlation matrix ifmodel = 'xstud', otherwisethe covariance matrix formed from the stationary Brown-Resnick process.

df

degrees of freedom for extremal Student process.

shape

tail index of the Pareto variates (reciprocal shape parameter). Must be strictly positive.

...

additional parameters, currently ignored

Details

For the moment, only exchangeable models and models based n elliptical processes are handled.

The parametrization of the Brown–Resnick is in terms of the matrixLambda, whichis formed by evaluating the semivariogram\gamma at sitess_i, s_j, meaning that\Lambda_{i,j} = \gamma(s_i, s_j)/2.

The argumentSigma is ignored for the Brown-Resnick modelifLambda is provided by the user.

Value

ann byd matrix of samples, whered = ncol(Sigma), withattributesmixt.weights.

Author(s)

Leo Belzile

See Also

rparp for general simulation of Pareto processes based on an accept-reject algorithm.

Examples

## Not run: #Brown-Resnick, Wadsworth and Tawn (2014) parametrizationD <- 20Lcoord <- cbind(runif(D), runif(D))semivario <- function(d, alpha = 1.5, lambda = 1){0.5 * (d/lambda)^alpha}Lambda <- semivario(as.matrix(dist(coord))) / 2rparpcs(n = 10, Lambda = Lambda, model = 'br', shape = 0.1)#Extremal StudentSigma <- stats::rWishart(n = 1, df = 20, Sigma = diag(10))[,,1]rparpcs(n = 10, Sigma = cov2cor(Sigma), df = 3, model = 'xstud')## End(Not run)

Simulation of generalized Huesler-Reiss Pareto vectors via composition sampling

Description

Sample from the generalized Pareto process associated to Huesler-Reiss spectral profiles.For the Huesler-Reiss Pareto vectors, the matrixSigma is utilized to buildQ viz.

Q = \Sigma^{-1} - \frac{\Sigma^{-1}\mathbf{1}_d\mathbf{1}_d^\top\Sigma^{-1}}{\mathbf{1}_d^\top\Sigma^{-1}\mathbf{1}_d}.

The location vectorm andSigma are the parameters of the underlying log-Gaussian process.

Usage

rparpcshr(n, u, alpha, Sigma, m)

Arguments

n

sample size

u

vector of marginal location parameters (must be strictly positive)

alpha

vector of shape parameters (must be strictly positive).

Sigma

covariance matrix of process, used to defineQ. SeeDetails.

m

location vector of Gaussian distribution.

Value

n by d matrix of observations

References

Ho, Z. W. O and C. Dombry (2019), Simple models for multivariate regular variations and theHuesler-Reiss Pareto distribution, Journal of Multivariate Analysis (173), p. 525-550,doi:10.1016/j.jmva.2019.04.008

Examples

D <- 20Lcoord <- cbind(runif(D), runif(D))di <- as.matrix(dist(rbind(c(0, ncol(coord)), coord)))semivario <- function(d, alpha = 1.5, lambda = 1){(d/lambda)^alpha}Vmat <- semivario(di)Sigma <- outer(Vmat[-1, 1], Vmat[1, -1], '+') - Vmat[-1, -1]m <- Vmat[-1,1]## Not run: samp <- rparpcshr(n = 100, u = c(rep(1, 10), rep(2, 10)),          alpha = seq(0.1, 1, length = 20), Sigma = Sigma, m = m)## End(Not run)

Simulate r-largest observations from point process of extremes

Description

Simulate ther-largest observations from a Poisson point process with intensity

\Lambda(x) = (1+\xi(x-\mu)/\sigma)^{-1/\xi}

.

Usage

rrlarg(n, r, loc = 0, scale = 1, shape = 0)

Arguments

n

sample size

r

number of observations per block

loc

location parameter

scale

scale parameter

shape

shape parameter

Value

ann byr matrix of samples from the point process, ordered from largest to smallest in each row.


Variogram model of Schlather and Moreva

Description

The variogram model is

\gamma(h) = \frac{[1+\{(\|h\|/\lambda\}^\alpha]^{\beta/\alpha}-1}{2^{\beta/\alpha}-1}, \quad 0 < \alpha \leq 2, \beta \leq 2.

The model is defined at\beta=0 by continuity.

Usage

schlather.vario(h, alpha, beta, scale = 1)

Arguments

h

vector or matrix of pairwise distances

alpha

smoothness parameter

beta

shape parameter, must be less than 2

scale

scale parameter

Value

a vector or matrix of variogram values of the same length ash


Ramos and Ledford test of independence

Description

The Ramos and Ledford (2005) score test of independence is a modification of tests by Tawn (1988) and Ledford and Tawn (1996) for a logistic model parameter\alpha=1; the latter two have scores with zero expectation, but the variance of the score are infinite, which produces non-regularity and yield test, once suitably normalized, that converge slowly to their asymptotic null distribution. The test, designed for bivariate samples, transforms observations to have unit Frechet margins and considers a bivariate censored likelihood approach for the logistic distribution.

Usage

scoreindep(xdat, p, test = c("ledford", "tawn"))

Arguments

xdat

an by 2 matrix of observations

p

probability level for the marginal threshold

test

string; iftawn, only censor observations in the upper quadrant when both variables are large as in Tawn (1988), otherwise censor marginally forledford as in Ledford and Tawn (1996).

Value

a list with elements

stat

value of the score test statistic

pval

asymptotic p-value

test

test argument


Exponential regression estimator

Description

This function implements the exponential regression estimator of the shape parameter for the case of Pareto tails with positive shape index.

Usage

shape.erm(xdat, k, method = c("bdgm", "fh"), bounds = NULL)

Arguments

xdat

vector of observations

k

vector of integer, the number of largest observations to consider

method

string; one ofbdgm for the approach of Beirlant, Dierckx, Goegebeur and Matthys (1999) orfh for Feuerverger and Hall (1999)

bounds

vector of length 2 giving the bounds forrho, the second order parameter. Default to\rho \in [-5, -0.5]

Details

The second-order parameter is difficult to pin down, and while values within[-1,0) are most logical under Hall model, the model parameters become unidentifiable when\rho \to 0. The default constraint restrict-5 <\rho < -0.5, with the upper bound changed to-0.25 for sample of sizes larger than 5000 observations. Users can set the value of the bounds for\rho via argumentbounds. The optimization is initialized at the Hill estimator.

Value

a data frame with columns

References

Feuerverger, A. and P. Hall (1999), Estimating a tail exponent by modelling departure from a Pareto distribution,The Annals of Statistics 27(2), 760-781. <doi:10.1214/aos/1018031215>

Beirlant, J., Dierckx, G., Goegebeur, Y. G. Matthys (1999). Tail Index Estimation and an Exponential Regression Model.Extremes, 2, 177–200 (1999). <doi:10.1023/A:1009975020370>


Generalized jackknife shape estimator

Description

Generalized jackknife shape estimator

Usage

shape.genjack(xdat, k)

Arguments

xdat

vector of positive observations

k

vector of order statistics; if missing, a vector going from 10 to sample size minus one.

Value

a data frame with the number of order statisticsk and the shape parameter estimateshape, or a single numeric value ifk is a scalar integer.

References

Gomes, I.M., João Martins, M. and Neves, M. (2000) Alternatives to a Semi-Parametric Estimator of Parameters of Rare Events-The Jackknife Methodology.Extremes, 3, 207–229.doi:10.1023/A:1011470010228


Beirlant et al. generalized quantile shape estimator

Description

This estimator estimates the real shape parameter based on generalized quantile plots based on mean excess functions, generalized median excesses or trimmed mean excesses.

Usage

shape.genquant(  xdat,  k,  type = c("genmean", "genmed", "trimmean"),  weight,  p = 0.5)

Arguments

xdat

n vector of observations

k

number of upper order statistics

type

string indicating the estimator choice, one ofgenmean,genmed andtrimmean.

weight

weight a kernel function on[0,1]

p

number between zero and one giving the proportion of order statistics for the second threshold

References

Beirlant, J., Vynckier P. and J.L. Teugels (1996).Excess functions and estimation of the extreme-value index. Bernoulli, 2(4), 293-318.


Hill's estimator of the shape parameter

Description

Given a sample of positive observations, calculate the tail index orshape parameter. The shape estimate returned is positive.

Usage

shape.hill(xdat, k)

Arguments

xdat

vector of positive observations

k

vector of order statistics; if missing, a vector going from 10 to sample size minus one.

Value

a data frame with the number of order statisticsk and the shape parameter estimateshape, or a single numeric value ifk is a scalar integer.

References

Hill, B.M. (1975).A simple general approach to inference about the tail of a distribution. Annals of Statistics,3, 1163-1173.

Examples

xdat <- mev::rgp(n = 200, loc = 1, scale = 0.5, shape = 0.5)shape.hill(xdat)

Lower-trimmed Hill estimator for the shape parameter

Description

Given a sample of Pareto-tailed samples (positive tail index),compute the lower-trimmed Hill estimator. Ifk0=k, the estimator reduces to Hill's estimator for the shape index

Usage

shape.lthill(xdat, k, k0 = k, sorted = FALSE, ...)

Arguments

xdat

[numeric] vector of positive observations

k

[integer] number of order statistics for the threshold

k0

[integer] vector of number of largest order statistics, no greater thank

sorted

[logical] ifTRUE, data are assumed to be sorted in decreasing order.

...

additional arguments for other routines (notablyvectorize)

Value

a scalar with the shape parameter estimate ifk0 is a scalar, otherwise a data frame with columnsk0 for the number of exceedances andshape for the tail index.

References

Bladt, M., Albrecher, H. & Beirlant, J. (2020)Threshold selection and trimming in extremes. Extremes, 23, 629-665 .doi:10.1007/s10687-020-00385-0

Examples

# Pareto samplen <- 200xdat <- 10/(1 - runif(n)) - 10shape.lthill(xdat = xdat, k = 100, k0 = 5:100)

Dekkers and de Haan moment estimator for the shape

Description

Given a sample of exceedances, compute the moment estimator of the real shape parameter.

Usage

shape.moment(xdat, k)

Arguments

xdat

vector of positive observations of lengthn

k

number of largest order statistics

Value

a data frame with the number of order statisticsk and the shape parameter estimateshape, or a single numeric value ifk is a scalar.

References

Dekkers, A.L.M. and de Haan, L. (1989).On the estimation of the extreme-value index and large quantile estimation., Annals of Statistics,17, 1795-1833.


Extreme U-statistic Pickands estimator

Description

Given a random sample ofn exceedances, the estimatorreturns an estimator of the shape parameter or extremevalue index using a kernel of order 3, based onk largest exceedances ofxdat. Note that the method does not allow for ties.

Usage

shape.osz(xdat, k, ...)

Arguments

xdat

vector of observations of lengthn

k

number of largest order statistics3 \leq k < n.

...

additional arguments for backward compatibility

Details

The calculations are based on the recursions provided in Lemma 4.3 of Oorschot et al.

References

Oorschot, J, J. Segers and C. Zhou (2023), Tail inference using extreme U-statistics, Electronic Journal of Statistics 17(1): 1113-1159.doi:10.1214/23-EJS2129

Examples

xdat <- rgp(n = 1000, shape = 0.2)shape.osz(xdat, k = 10)

Pickand's shape estimator

Description

Given a sample of sizen of positive exceedances, computethe real shape parameter\xi based on thek largest order statistics.

Usage

shape.pickands(xdat, k)

Arguments

xdat

vector of positive observations of lengthn

k

number of largest order statistics

Value

a data frame with the number of order statisticsk and the shape parameter estimateshape, or a single numeric value ifk is a scalar.

References

Pickands, III, J. (1975).Statistical inference using extreme order statistics. Annals of Statistics,3, 119-131.


Random block maxima shape estimator of Wager

Description

Computes the shape estimator for varying k up to sample size of maximumkmax largest observations

Usage

shape.rbm(xdat, k = 10:floor(length(xdat)/2), ...)

Arguments

xdat

[vector] sample exceedances

k

[int] vector of integers for which to compute the estimator

...

additional parameters, currently ignored

Value

a list with elements

k

number of exceedances

shape

tail index estimate, strictly positive

risk

empirical Bayes estimate of risk

thresh

threshold given by the smallest order statistic considered in the sample

References

Wager, S. (2014). Subsampling extremes: From block maxima to smooth tail estimation,Journal of Multivariate Analysis, 130, 335-353,doi:10.1016/j.jmva.2014.06.010


Trimmed Hill estimator for the shape parameter

Description

Given a sample of Pareto-tailed samples (positive tail index),compute the trimmed Hill estimator. Ifk0=k, the estimatorreduces to Hill's estimator for the shape index

Usage

shape.trimhill(xdat, k, k0, sorted = FALSE)

Arguments

xdat

[numeric] vector of positive observations

k

[integer] number of order statistics for the threshold

k0

[integer] number of largest order statistics, strictly less thank

sorted

[logical] ifTRUE, data are assumed to be sorted in decreasing order

Value

a scalar with the shape parameter estimate

References

Bhattacharya, S., Kallitsis, M. and S. Stoev, (2019) Data-adaptive trimming of the Hill estimator and detection of outliers in the extremes of heavy-tailed data. Electronic Journal of Statistics 13, 1872–1925


de Vries shape estimator

Description

Given a sample of exceedances, compute the moment estimator of the positive shape parameter using the ratio of log ratio of exceedance and it's square.

Usage

shape.vries(xdat, k)

Arguments

xdat

vector of positive observations of lengthn

k

number of largest order statistics

Value

a data frame with the number of order statisticsk and the shape parameter estimateshape, or a single numeric value ifk is a scalar.

References

de Haan, L. and Peng, L. (1998).Comparison of tail index estimators, Statistica Neerlandica 52, 60-70.

de Haan, L. and Peng, L. (1998). Comparison of tail index estimators.Statistica Neerlandica, 52: 60-70.doi:10.1111/1467-9574.00068


Smith's penultimate approximations

Description

The function takes as arguments the distribution and density functions. There are two options:method='bm' yields block maxima andmethod='pot' threshold exceedances.Formethod='bm', the user should provide in such case the block sizes via theargumentm, whereas ifmethod='pot', a vector of threshold values should beprovided. The other argument (u orm depending on the method) is ignored.

Usage

smith.penult(  family,  method = c("bm", "pot"),  u = NULL,  qu = NULL,  m = NULL,  returnList = TRUE,  ...)

Arguments

family

the name of the parametric family. Will be used to obtaindfamily,pfamily,qfamily

method

either block maxima ('bm') or peaks-over-threshold ('pot') are supported

u

vector of thresholds for method'pot'

qu

vector of quantiles for method'pot'. Ignored if argumentu is provided.

m

vector of block sizes for method'bm'

returnList

logical; should the arguments be returned as a list or as a matrix of parameter

...

additional arguments passed todensF anddistF

Details

Alternatively, the user can provide functionsdensF,quantF anddistF for the density,quantile function and distribution functions, respectively. The user can also supply the derivativeof the density function,ddensF. If the latter is missing, it will be approximated using finite-differences.

Formethod = "pot", the function computes the reciprocal hazard and its derivative on the log scale to avoid numerical overflow. Thus, the density function should have argumentlog and the distribution function argumentslog.p andlower.tail, respectively.

Value

either a vector, a matrix if eitherlength(m)>1 orlength(u)>1 or a list (ifreturnList) containing

Author(s)

Leo Belzile

References

Smith, R.L. (1987). Approximations in extreme value theory.Technical report 205, Center for Stochastic Process, University of North Carolina, 1–34.

Examples

# Threshold exceedance for Normal variablesqu <- seq(1, 5, by = 0.02)penult <- smith.penult(  family = "norm",  ddensF = function(x) {    -x * dnorm(x)  },  method = 'pot',  u = qu)plot(  x = qu,  y = penult$shape,  type = 'l',  xlab = 'Quantile',  ylab = 'Penultimate shape',  ylim = c(-0.5, 0))#Block maxima for Gamma variables -#User must provide arguments for shape (or rate)m <- seq(30, 3650, by = 30)penult <- smith.penult(  family = 'gamma',  method = 'bm',  m = m,  shape = 0.1)plot(  x = m,  penult$shape,  type = 'l',  xlab = 'Quantile',  ylab = 'Penultimate shape')#Comparing density of GEV approximation with true density of maximam <- 100 #block of size 100p <- smith.penult(  family = 'norm',  ddensF = function(x) {    -x * dnorm(x)  },  method = 'bm',  m = m,  returnList = FALSE)x <- seq(1, 5, by = 0.01)plot(  x,  m * dnorm(x) * exp((m - 1) * pnorm(x, log.p = TRUE)),  type = 'l',  ylab = 'Density',  main = 'Distribution of the maxima of\n 100 standard normal variates')lines(x, mev::dgev(x, loc = p[1], scale = p[2], shape = 0), col = 2)lines(x, mev::dgev(x, loc = p[1], scale = p[2], shape = p[3]), col = 3)legend(  x = 'topright',  lty = c(1, 1, 1, 1),  col = c(1, 2, 3, 4),  legend = c('exact', 'ultimate', 'penultimate'),  bty = 'n')

Smith's third penultimate approximation

Description

This function returns the density and distribution functionsof the 3rd penultimate approximation for extremes of Smith (1987). It requiresknowledge of the exact constants\epsilon and\rho described in the paper.

Usage

smith.penult.fn(  loc,  scale,  shape,  eps,  rho = NULL,  method = c("bm", "pot"),  mdaGumbel = FALSE,  ...)

Arguments

loc

location parameter returned bysmith.penult or threshold vector

scale

scale parameter returned bysmith.penult

shape

shape parameter returned bysmith.penult

eps

parameter vector, seeDetails.

rho

second-order parameter, model dependent

method

one ofpot for the generalized Pareto orbm for the generalized extreme value distribution

mdaGumbel

logical indicating whether the functionH_{\rho} should be replaced byx^3/6; seeDetails.

...

additional parameters, currently ignored. These are used for backward compatibility due to a change in the names of the arguments.

Details

LetF,f denote respectively the distribution and density functions and define the function\phi(x) as

\phi(x)=-\frac{F(x)\log F(x)}{f(x)}

for block maxima.The sequenceloc corresponds tob_n otherwise, defined as the solution ofF(b_n)=\exp(-1/n).

Thescale is given bya_n=\phi(b_n), theshape as\gamma_n=\phi'(b_n). These are returned by a call tosmith.penult.

For threshold exceedances,b_n is replaced by the sequence of thresholdsu and wetake instead\phi(x) to be the reciprocal hazard function\phi(x)=(1-F(x))/f(x).

In cases where the distribution function is in the maximum domain ofattraction of the Gumbel distribution,\rho is possibly undetermined and\epsilon can be equal to\phi(b_n)\phi''(b_n).

For distributions in the maximum domain ofattraction of the Gumbel distribution and that are class N, it is also possible to abstract from the\rho parameter by substituting the functionH_{\rho} byx^3/6 without affecting the rate of convergence. This can be done by settingmdaGumbel=TRUE in the function call.

Warning

The third penultimate approximation does not yield a valid distribution function over the whole range of the original distribution, but is rather valid in a neighborhood of the true support of the distribution of maxima/threshold exceedance.The function handles the most standard failure (decreasing distribution function and negative densities), but any oscillatory behaviour will not necessarily be captured.This is inherent to the method and can be resolved by ‘not’ evaluating the functionsF andf at the faulty points.

References

Smith, R.L. (1987). Approximations in extreme value theory.Technical report 205, Center for Stochastic Process, University of North Carolina, 1–34.

Examples

#Normal maxima example from Smith (1987)m <- 100 #block of size 100p <- smith.penult(family='norm',   ddensF=function(x){-x*dnorm(x)}, method='bm', m=m, returnList=FALSE)approx <- smith.penult.fn(loc=p[1], scale=p[2], shape=p[3],   eps=p[3]^2+p[3]+p[2]^2, mdaGumbel=TRUE, method='bm')x <- seq(0.5,6,by=0.001)#First penultimate approximationplot(x, exp(m*pnorm(x, log.p=TRUE)),type='l', ylab='CDF',main='Distribution of the maxima of\n 100 standard normal variates')lines(x, mev::pgev(x,loc=p[1], scale=p[2], shape=0),col=2)lines(x, mev::pgev(x,loc=p[1], scale=p[2], shape=p[3]),col=3)lines(x, approx$F(x),col=4)legend(x='bottomright',lty=c(1,1,1,1),col=c(1,2,3,4),   legend=c('Exact','1st approx.','2nd approx.','3rd approx'),bty='n')plot(x, m*dnorm(x)*exp((m-1)*pnorm(x,log.p=TRUE)),type='l', ylab='Density',main='Distribution of the maxima of\n 100 standard normal variates')lines(x, mev::dgev(x,loc=p[1], scale=p[2], shape=0),col=2)lines(x, mev::dgev(x,loc=p[1], scale=p[2], shape=p[3]),col=3)lines(x, approx$f(x),col=4)legend(x='topright',lty=c(1,1,1,1),col=c(1,2,3,4), legend=c('Exact','1st approx.','2nd approx.','3rd approx'),bty='n')#Threshold exceedancespar <- smith.penult(family = "norm", ddensF=function(x){-x*dnorm(x)},method='pot', u=4, returnList=FALSE)approx <- smith.penult.fn(loc=par[1], scale=par[2], shape=par[3], eps=par[3]^2+par[3]+par[2]^2, mdaGumbel=TRUE, method='pot')x <- seq(4.01,7,by=0.01)#Distribution functionplot(x, 1-(1-pnorm(x))/(1-pnorm(par[1])),type='l', ylab='Conditional CDF',main='Exceedances over 4\n for standard normal variates')lines(x, mev::pgp(x, loc=par[1], scale=par[2], shape=0),col=2)lines(x, mev::pgp(x, loc=par[1], scale=par[2], shape=par[3]),col=3)lines(x, approx$F(x),col=4)#Densityplot(x, dnorm(x)/(1-pnorm(par[1])),type='l', ylab='Conditional density',main='Exceedances over 4\n for standard normal variates')lines(x, dgp(x, loc=par[1], scale=par[2], shape=0),col=2)lines(x, dgp(x, loc=par[1], scale=par[2], shape=par[3]),col=3)lines(x, approx$f(x),col=4)

Spline correction for Fraser-Reid approximations

Description

The tangent exponential model can be numerically unstable for values close tor = 0.This function corrects these incorrect values, which are interpolated using splines.The function takes as input an object of classfr and returns the same object withdifferentrstar values.

Usage

spline.corr(fr, method = c("cobs", "smooth.spline"))

Arguments

fr

an object of classfr, normally the output ofgpd.tem orgev.tem.

method

string for the method, eithercobs (constrained robust B-spline from eponym package) orsmooth.spline

Details

If available, the function usescobs from the eponym package. The latter handles constraints and smoothness penalties, and is more robust than the equivalentsmooth.spline.

Value

an object of classfr, containing as additional argumentsspline and a modifiedrstar argument.

Warning

While penalized (robust) splines often do a good job at capturing and correcting for numerical outliers andNA, itmay also be driven by unusual values lying on the profile log-likelihood the curve or fail to detect outliers (or falsely identifying ‘correct’ values as outliers). The user should always validate by comparing the plots of both the uncorrected (raw) output of the object with that ofspline.corr.


Semi-parametric marginal transformation to uniform

Description

The functionspunif transforms a matrix or vector of dataxto the pseudo-uniform scale using a semiparametric transform. Data below the thresholdare transformed to pseudo-uniforms using a rank transform, while data above the thresholdare assumed to follow a generalized Pareto distribution. The parameters of the latter areestimated using maximum likelihood if eitherscale = NULL orshape = NULL.

Usage

spunif(x, thresh, scale = NULL, shape = NULL)

Arguments

x

matrix or vector of data

thresh

vector of marginal thresholds

scale

vector of marginal scale parameters for the generalized Pareto

shape

vector of marginal shape parameters for the generalized Pareto

Value

a matrix or vector of the same dimension asx, with pseudo-uniform observations

Author(s)

Leo Belzile

Examples

x <- rmev(1000, d = 3, param = 2, model = 'log')thresh <- apply(x, 2, quantile, 0.95)spunif(x, thresh)

Stein's weighted generalized Pareto likelihood

Description

Stein's weighted generalized Pareto likelihood

Usage

stein_gp_lik(pars, xdat, weights = rep(1, length(xdat)), ...)

Arguments

pars

vector of length with scale and shape parameters

xdat

vector of exceedances in decreasing order ifsort=FALSE

weights

vector of weights

...

additional arguments, currently ignored

References

Stein, M.L. (2023). A weighted composite log-likelihood approach to parametric estimation of the extreme quantiles of a distribution.Extremes26, 469-507 <doi:10.1007/s10687-023-00466-w>


Coefficient of tail correlation and tail dependence

Description

For data with unit Pareto margins, the coefficient of tail dependence\eta is defined via

\Pr(\min(X) > x) = L(x)x^{-1/\eta},

whereL(x) is a slowly varying function. Ignoring the latter, several estimators of\eta can be defined. In unit Pareto margins,\eta is a nonnegative shape parameter that can be estimated by fitting a generalized Pareto distribution above a high threshold. In exponential margins,\eta is a scale parameter and the maximum likelihood estimator of the latter is the Hill estimator. Both methods are based on peaks-over-threshold and the user can choose between pointwise confidence obtained through a likelihood ratio test statistic ("lrt") or the Wald statistic ("wald").

Usage

taildep(  xdat,  qlev = NULL,  nq = 40,  qlim = c(0.8, 0.99),  depmeas = c("eta", "chi"),  estimator = list(eta = c("emp", "betacop", "gpd", "hill"), chi = c("emp", "betacop")),  confint = c("wald", "lrt"),  level = 0.95,  trunc = TRUE,  margtrans = c("emp", "none"),  ties.method = "random",  plot = TRUE,  ...)

Arguments

xdat

ann byd matrix of multivariate observations

qlev

vector of percentiles between 0 and 1

nq

number of quantiles of the structural variable at which to form a grid; only used ifu = NULL.

qlim

limits for the sequenceu of the structural variable

depmeas

dependence measure, either of"eta" or"chi"

estimator

named list giving the estimation method foreta andchi. Default to"emp" for both.

confint

string indicating the type of confidence interval for\eta, one of"wald" or"lrt"

level

the confidence level required (default to 0.95).

trunc

logical indicating whether the estimates and confidence intervals should be truncated in[0,1]

margtrans

marginal transformation; if"none", data are assumed to be in uniform margins

ties.method

string indicating the type of method forrank; seerank for a list of options. Default to"random"

plot

logical; should graphs be plotted?

...

additional arguments passed toplot; current support formain,xlab,ylab,add and furtherpch,lty,type,col for points; additional arguments for confidence intervals are handled viacipch,cilty,citype,cicol.

Details

The most common approach for estimation is the empirical survival copula, by evaluating the proportion of sample minima with uniform margins that exceed a givenx. An alternative estimator uses a smoothed estimator of the survival copula using Bernstein polynomial, resulting in the so-calledbetacop estimator. Approximate pointwise confidence intervals for the latter are obtained by assuming the proportion of points is binomial.

The coefficient of tail correlation\chi is

\chi = \lim_{u \to 1} \frac{\Pr(F_1(X_1)>u, \ldots, F_D(X_D)>u)}{1-u}.

Asymptotically independent vectors have\chi = 0. The estimator uses an estimator of the survival copula

Value

a named list with elements

Note

As of version 1.15, the percentiles used are from the minimum variable. This ensures that, regardless of the number of variables,there is no error message returned because the quantile levels are too low for there to be observations

See Also

chiplot for bivariate empirical estimates of\chi and\bar{\chi}.

Examples

## Not run: set.seed(765)# Max-stable modeldat <- rmev(n = 1000, d = 4, param = 0.7, model = "log")taildep(dat, confint = 'wald')## End(Not run)

Bridging the singularity for higher order asymptotics

Description

The correction factor\log(q/r)/r for thelikelihood root is unbounded in the vincinity ofthe maximum likelihood estimator. The thesis ofRongcai Li (University of Toronto, 2001)explores different ways of bridging thissingularity, notably using asymptotic expansions.

Usage

tem.corr(fr, print.warning = FALSE)

Arguments

fr

an object of classfr

print.warning

logical; should warning message be printed? Default toFALSE

Details

The poor man's method used here consists infitting a robust regression to1/q-1/ras a function ofr and using predictionsfrom the model to solve forq. Thisapproach is seemingly superior to thatpreviously used inspline.corr.

Value

an object of classfr, containing as additional argumentsspline and a modifiedrstar argument.


P-P plot for testing max stability

Description

The diagnostic, proposed by Gabda, Towe, Wadsworth and Tawn,relies on the fact that, for max-stable vectors on the unit Gumbel scale,the distribution of the maxima is Gumbel distribution with a location parameter equal to the exponent measure.One can thus consider tuples of sizem and estimate the location parameter via maximum likelihoodand transforming observations to the standard Gumbel scale. Replicates are then pooled and empirical quantiles are defined.The number of combinations ofm vectors can be prohibitively large, hence onlynmax randomly selectedtuples are selected from all possible combinations. The confidence intervals are obtained by anonparametric bootstrap, by resampling observations with replacement observations for the selected tuples and re-estimating thelocation parameter. The procedure can be computationally intensive as a result.

Usage

test.maxstab(  xdat,  m = prod(dim(dat)[-1]),  nmax = 500L,  B = 1000L,  ties.method = "random",  plot = TRUE,  ...)

Arguments

xdat

matrix or array of max-stable observations, typically block maxima. The first dimension should consist of replicates

m

integer indicating how many tuples should be aggregated.

nmax

maximum number of pairs. Default to 500L.

B

number of nonparametric bootstrap replications. Default to 1000L.

ties.method

string indicating the method forrank. Default to"random".

plot

logical indicating whether a graph should be produced (default toTRUE).

...

additional arguments for backward compatibility

Value

a Tukey probability-probability plot with 95% confidence intervals obtained using a nonparametric bootstrap

References

Gabda, D.; Towe, R. Wadsworth, J. and J. Tawn, Discussion of “Statistical Modeling of Spatial Extremes” by A. C. Davison, S. A. Padoan and M. Ribatet.Statist. Sci.27 (2012), no. 2, 189–192.

Examples

## Not run: xdat <- mev::rmev(n = 250, d = 100, param = 0.5, model = "log")test.maxstab(xdat, m = 100)xdat <- rmnorm(n = 250, Sigma = diag(0.5, 10) + matrix(0.5, 10, 10), mu = rep(0, 10))test.maxstab(xdat, m = 2, nmax = 100)test.maxstab(xdat, m = ncol(xdat))## End(Not run)

Ramos and Ledford test of independence

Description

The Ramos and Ledford (2005) score test of independence is a modification of tests by Tawn (1988) and Ledford and Tawn (1996) for a logistic model parameter\alpha=1; the latter two have scores with zero expectation, but the variance of the score are infinite, which produces non-regularity and yield test, once suitably normalized, that converge slowly to their asymptotic null distribution. The test, designed for bivariate samples, transforms observations to have unit Frechet margins and considers a bivariate censored likelihood approach for the logistic distribution.

Usage

test.scoreindep(xdat, p, test = c("ledford", "tawn"))

Arguments

xdat

an by 2 matrix of observations

p

probability level for the marginal threshold

test

string; iftawn, only censor observations in the upper quadrant when both variables are large as in Tawn (1988), otherwise censor marginally forledford as in Ledford and Tawn (1996).

Value

a list with elements

stat

value of the score test statistic

pval

asymptotic p-value

test

test argument

Examples

samp <- rmev(n = 1000, d = 2,    param = 0.99, model = "log")(test.scoreindep(samp, p = 0.9))

Automatic L-moment ratio selection method

Description

Given a sample of observations, calculate the L-skewness and L-kurtosisover a set of candidate thresholds. For each threshold candidate, wefind the L-skewness that minimizes the sum of squared distance between thetheoretical L-skewness and L-kurtosis of the generalized Pareto distribution,

\min_{\tau_3} (t_3-\tau_3)^2 + [t_4 - \tau_3(1+5\tau_3)/(5+\tau_3)]^2.

The function returns the threshold with the minimum distance.

Usage

thselect.alrs(xdat, thresh, plot = FALSE)

Arguments

xdat

[numeric] vector of observations

thresh

[numeric] vector of candidate thresholds. If missing, 20 sample quantiles starting at the 0.25 quantile in increments of 3.75 percent.

plot

[logical] ifTRUE, return a plot of the sample L-kurtosis against the L-skewness, along with the theoretical generalized Pareto curve.

Value

scalar for the chosen numeric threshold

References

Silva Lomba, J., Fraga Alves, M.I. (2020).L-moments for automatic threshold selection in extreme value analysis. Stoch Environ Res Risk Assess, 34, 465–491.doi:10.1007/s00477-020-01789-x


Lower truncated Hill threshold selection

Description

Given a sample of positive data with Pareto tail, the algorithm computes the optimal number of order statistics that minimizes the variance of the average left truncated tail index estimator, and uses the relationship to the Hill estimator for the Hall class of distributions to derive the optimal number (minimizing the asymptotic mean squared error) of the Hill estimator. The default value for the second order regular variation index is taken to be\rho=-1.

Usage

thselect.bab(  xdat,  kmin = floor(0.2 * length(xdat)),  kmax = length(xdat) - 1L,  rho = -1,  test = FALSE,  nsim = 999L,  level = 0.95)

Arguments

xdat

[vector] positive vector of exceedances

kmin

[int] minimum number of exceedances

kmax

[int] maximum number of exceedances for the estimation of the shape parameter.

rho

[double] scalar for the second order regular variation index, a negative number.

test

[logical] ifTRUE, computes the goodness-of-fit statistic for the model using Monte Carlo

nsim

[int] number of replications for Monte Carlo test, used only iftest=TRUE.

level

[double] confidence level for test

Value

a list with the number of order statistics for the Hill estimator,k0 and the corresponding shape estimateshape, the average lower-trimmed Hill estimatorshape._lth and the number of order statistics upon which the latter is based,k0_lth.

References

Bladt, M., Albrecher, H. & Beirlant, J. (2020)Threshold selection and trimming in extremes. Extremes, 23, 629-665 .doi:10.1007/s10687-020-00385-0


Threshold selection by shape mean square error minimization

Description

Use a semiparametric bootstrap to calculate the mean squared errorof the shape parameter using maximum likelihood for different thresholds, and return the one that minimize the mean squared error.

Usage

thselect.cbm(xdat, thresh, B = 100)

Arguments

xdat

vector of observations

thresh

vector of thresholds

B

number of bootstrap replications

Value

an object of classmev_thselect_cbm containing

References

Caers, J., Beirlant, J. and Maes, M.A. (1999). Statistics for Modeling Heavy Tailed Distributions in Geology: Part I. Methodology.Mathematical Geology, 31, 391–410. <doi:10.1023/A:1007538624271>

Examples

set.seed(2025)xdat <- rnorm(1000)thresh <- qnorm(c(0.8, 0.9, 0.95))thselect.cbm(xdat, thresh, B = 50)

Threshold selection via coefficient of variation

Description

This function computes the empirical coefficient of variation andcomputes a weighted statistic comparing the squared distance withthe theoretical coefficient variation corresponding to a specificshape parameter (estimated from the data using a moment estimatoras the value minimizing the test statistic, or using maximum likelihood).The procedure stops if there are no more than 10 exceedances above thehighest threshold.

Usage

thselect.cv(  xdat,  thresh,  method = c("mle", "wcv", "cv"),  nsim = 999L,  nthresh = 10L,  level = 0.05,  lazy = FALSE,  plot = FALSE)

Arguments

xdat

[vector] vector of observations

thresh

[vector] vector of threshold. If missing, set top^k fork=0 tok=nthresh

method

[string], either moment estimator for the (weighted) coefficient of variation (wcv andcv) or maximum likelihood (mle)

nsim

[integer] number of bootstrap replications

nthresh

[integer] number of thresholds, ifthresh is not supplied by the user

level

[numeric] probability level for sequential testing procedure

lazy

[logical] compute the bootstrap p-value until the test stops rejecting at levellevel? Default toFALSE

plot

[logical] ifTRUE, returns a plot of the p-value path

Value

a list with elements

Note

The authors suggest transformation of

Y = -1/(X + c) + 1/c,

whereX are exceedances andc=\sigma/\xi is the ratio of estimated scale and shape parameters. For heavy-tailed distributions with\xi > 0.25, this may be preferable, but must be conducted outside of the function.

References

del Castillo, J. and M. Padilla (2016).Modelling extreme values by the residual coefficient of variation, SORT, 40(2), pp. 303–320.

Examples

thselect.cv( xdat = rgp(1000), thresh = qgp(seq(0,0.9, by = 0.1)), nsim = 99, lazy = TRUE, plot = TRUE)

Threshold selection based on extended generalized Pareto models

Description

Fit an EGP model to data over a range of candidate thresholdsthresh and perform likelihood-based tests of equality for\kappa=c, wherec=1 for all regular models and $c=0 for the'gj-tnorm' and'logist' models, for which the generalized Pareto special case corresponds to a value of\kappa occuring on the boundary of the parameter space.

Usage

thselect.egp(  xdat,  thresh,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"),  type = c("wald", "lrt"),  level = 0.95,  transform = FALSE,  plot = FALSE,  ...)

Arguments

xdat

vector of observations, greater than the threshold

thresh

threshold value

model

a string indicating which extended family to fit

type

choice of test statistic, eitherwald for Wald-based intervals, orlrt for profile likelihood ratio test.

level

[double] confidence interval level, default to 0.95.

transform

logical; ifTRUE andtype="wald", intervals forkappa are computed on the log-scale and back-transformed.

plot

[logical] ifTRUE, return a plot of p-values against threshold

...

additional arguments, passed to plotting routine

Details

The threshold selection procedure returns chi-square statistics (stat) for Wald or profile likelihood ratio tests, along with p-values (pval) obtained from large sample distribution. The threshold returned is the lowest for which all further higher thresholds fail to reject the null hypothesis of\kappa=c, or equivalently of generalized Pareto tail.

Value

an invisible list of classmev_thselect_egp with elements

Examples

ths <- thselect.egp(  xdat = rexp(1000),  thresh = qexp(c(0.8,0.9,0.95)),  model = "pt-power")print(ths)plot(ths)

Generalized quantile threshold selection

Description

The methodology proposed by Beirlant, Vynckier and Teugels (1996)uses an asymptotic expansion of the mean squared error for Hill's estimator givena random sample with Pareto tails and positive shape, using an exponential regression.The value ofk is selected to minimize the mean squared error given optimal weighting scheme. Thisdepends on the order of regular variation\rho, which is obtained based on the slope of the differencein Hill estimators, suitably reweighted. The iterative procedure of Beirlant et al. alternates between parameter estimationuntil convergence. It returns the Hill shape estimate, the number of higher order statistic, the parameterrho andestimates of the standard error of the shape and the mean squared error, based on the ultimate parameter values.Since the weights can become negative, there is no guarantee that the mean squared error estimate is positive, nor thatthe estimated value of\rho is nonpositive.

Usage

thselect.expgqt(  xdat,  maxiter = 10L,  tol = 2,  kmin = max(10, floor(length(xdat)/100)),  kmax = floor(0.8 * length(xdat)),  ...)

Arguments

xdat

[vector] sample of exceedances

maxiter

[int] maximum number of iteration

tol

[double] tolerance for difference in value ofk for the fixed point

kmin

[int] minimum number of exceedances for the estimator

kmax

[int] maximum number of exceedances for the estimator

...

additional arguments, currently ignored

Value

a list with components

References

Beirlant, J., Vynckier, P., & Teugels, J. L. (1996). Excess Functions and Estimation of the Extreme-Value Index.Bernoulli, 2(4), 293–318.doi:10.2307/3318416

Examples

# Simulate Pareto data - log(xdat) is exponential with rate 2xdat <- rgp(n = 200, loc = 1, scale = 0.5, shape = 0.5)(thselect.expgqt(xdat))

Kernel-based threshold selection of Goegebeur, Beirlant and de Wet (2008)

Description

Kernel-based threshold selection of Goegebeur, Beirlant and de Wet (2008)

Usage

thselect.gbw(  xdat,  kmax,  kernel = c("Jackson", "Lewis"),  rho = c("gbw", "ghp", "fagh", "dk"),  ...)

Arguments

xdat

[vector] sample exceedances

kmax

[int] maximum number of exceedances considered

kernel

[string] kernel choice, one ofJackson orLewis

rho

string for the estimator of the second order regular variation. Can also be a negative scalar

...

additional arguments, for backward compatibility purposes

Value

a list with elements

References

Goegebeur , Y., Beirlant , J., and de Wet , T. (2008). Linking Pareto-Tail Kernel Goodness-of-fit Statistics with Tail Index at Optimal Threshold and Second Order Estimation. REVSTAT-Statistical Journal, 6(1), 51–69. <doi:10.57805/revstat.v6i1.57>

Examples

xdat <- rgp(n = 1000, scale = 2, shape = 0.5)(thselect.gbw(xdat, kmax = 500))

Mahalanobis distance-based methodology

Description

Compute the Mahalanobis distance-based threshold method over a grid of thresholds bytransforming data from generalized Pareto to unit exponential based on probability weighted moment estimates,then computing the first L-moment and the L-skewness. The latter are compared to thetheoretical counterparts from a unit exponential sample of the same size, which is used to computethe Mahalanobis distance. The threshold returned is the one which minimizes the distance.

Usage

thselect.ksmd(xdat, thresh, approx = c("asymptotic", "mc"), nsim = 1000L)

Arguments

xdat

[numeric] vector of observations

thresh

[numeric] vector of candidate thresholds. If missing, 20 sample quantiles starting at the 0.25 quantile in increments of 3.75 percent.

approx

[string] method to use to obtain moments of first L-moment

nsim

[integer] number of replications for Monte Carlo approximation

Value

a list with components

References

Kiran, K. G. and Srivinas, V.V. (2021).A Mahalanobis distance-based automatic threshold selection method for peaks over threshold model. Water Resources Research 57. <doi:10.1029/2020WR027534>


Minimum distance threshold selection procedure

Description

Minimum distance threshold selection procedure

Usage

thselect.mdps(xdat)

Arguments

xdat

vector of positive exceedances

Value

a list with components

k0

order statistic corresponding to threshold (number of exceedances)

shape

Hill's estimator of the tail index based on k0 exceedances

thresh0

numerical value of the threshold, the n-k0+1 order statistic of the original sample

References

Clauset, A., Shalizi, C.R. and Newman, M.E.J. (2009).Power-Law Distributions in Empirical Data. SIAM Review. Society for Industrial and Applied Mathematics,51, 661-703,doi:10.1137/070710111


Automated mean residual life plots

Description

This function implements the automated proposal from Section 2.2 of Langousis et al. (2016) for mean residual life plots. It returns the threshold that minimize the weighted mean square error and moment estimators for the scale and shape parameter based on weighted least squares.

Usage

thselect.mrl(xdat, thresh, kmax, plot = TRUE, ...)

Arguments

xdat

[numeric] vector of observations

thresh

[numeric] vector of thresholds; if missing, uses all order statistics from the 20th largest untilkmax as candidates

kmax

[integer] maximum number of order statistics

plot

[logical] ifTRUE (default), return a plot of the mean residual life plot with the fitted slopeand the chosen threshold

...

additional arguments, currently ignored

Details

The procedure consists in estimating the usual mean residual life as a function of the threshold, and looking for an order statistic or threshold value above which the fit is more or less linear.

Value

a list containing

References

Langousis, A., A. Mamalakis, M. Puliga and R. Deidda (2016).Threshold detection for the generalized Pareto distribution:Review of representative methods and application to theNOAA NCDC daily rainfall database, Water Resources Research,52, 2659–2681.

Examples

thselect.mrl(rgp(n = 100))

Northop and Coleman piecewise generalized Pareto threshold selection diagnostic

Description

The model tests the null hypothesis of a generalized Pareto above each threshold inthresh against the alternative of a piecewise generalized Pareto model with continuity constraints.

Usage

thselect.ncpgp(xdat, thresh, test = "score", plot = FALSE, level = 0.95, ...)

Arguments

xdat

[vector] observations

thresh

[vector] candidate thresholds

test

[string] indicating whether to performscore test or likelihood ratio (lr) test. The latter requires fitting the alternative model, and so is more computationally expensive.

plot

[logical]; ifTRUE, return a plot with the p-value path.

level

[double] confidence level for confidence interval, defaults to 0.95

...

additional arguments, for backward compatibility purposes

Value

an object of classmev_thselect_ncpgp containing the test statistic (stat), the p-values (pval), the threshold candidates (thresh) and the selected threshold (thresh0).


Prediction error C-criterion threshold selection method

Description

This function computes the non-robust Pareto prediction error of Dupuis and Victoria-Feser (2003), termed C-criterion, for the Hill estimator of the shape parameter. The threshold returned is the value of the threshold, taken from order statistics, that minimizes the average prediction error.

Usage

thselect.pec(xdat, kmax)

Arguments

xdat

vector of observations

kmax

maximum number of order statistics to consider. Default to sample size if left unspecified.

Value

a list with the number of exceedancesk, the chosen thresholdthresh0 and the corresponding Hill estimator shape estimateshape.

References

Dupuis, D.J. and M.-P. Victoria-Feser (2003). A Prediction Error Criterion for Choosing the Lower Quantile in Pareto Index Estimation, University of Geneva, technical report,https://archive-ouverte.unige.ch/unige:5789.


Pickands' order statistics threshold selection method

Description

Restricting to the largest fourth of the data, returns the number of exceedances that minimizes the Kolmogorov-Smirnov statistic, i.e., the maximum absolute difference between the estimated generalized Pareto and the empirical distribution of exceedances. Relative to the paper, different estimation methods are proposed.

Usage

thselect.pickands(xdat, thresh, method = c("mle", "lmom", "quartiles"))

Arguments

xdat

[numeric] vector of observations

thresh

[numeric] vector of candidate thresholds. If missing, defaults to order statistics from the 10th to a quarter of the sample size.

method

[string] estimation method, either the quartiles of Pickands (1975), maximum likelihood, probability weighted moments or L-moments

Value

a list with components

Note

The quartiles estimator of Pickands is robust, but very inefficient. It is provided for historical reasons.

References

James Pickands III (1975).Statistical inference using extreme order statistics, Annals of Statistics, 3(1) 119-131,doi:10.1214/aos/1176343003


Threshold selection for the random block maxima method

Description

Threshold selection for the random block maxima method

Usage

thselect.rbm(xdat, kmax = length(xdat))

Arguments

xdat

[vector] sample exceedances

kmax

maximum number of exceedances to consider.

Value

a list with elements


Threshold selection via SAMSEE

Description

Smooth asymptotic mean squared error estimatorof Schneider et al. (2021) for threshold selection.The implementation uses a second-order regular variation index of -1

Usage

thselect.samsee(xdat)

Arguments

xdat

vector of positive exceedances

Value

a list with elements

k0

optimal number of exceedances

shape

Hill estimator of the tail index

thresh0

selected threshold

References

Schneider, L.F., Krajina, A. & Krivobokova, T. (2021).Threshold selection in univariate extreme value analysis, Extremes,24, 881–913doi:10.1007/s10687-021-00405-7


Threshold selection diagnostic of Suveges and Davison

Description

The information matrix test (IMT), proposed by Suveges and Davison (2010), is basedon the difference between the expected quadratic score and the second derivative ofthe log-likelihood. The asymptotic distribution for each thresholdu and gapKis asymptotically\chi^2 with one degree of freedom. The approximation is good forN>80 and conservative for smaller sample sizes. The test assumes independence between gaps.

Usage

thselect.sdinfo(xdat, thresh, qlev, plot = FALSE, kmax = 1, k = 1)

Arguments

xdat

[vector] vector of observations

thresh

[vector] candidate thresholds

qlev

[vector] probability levels to define threshold ifthresh is missing.

plot

[logical]; should the graphical diagnostic be plotted?

kmax

[int] the largest K-gap under consideration for clusters

k

[int] the K-gap for automatic threshold selection

Details

The procedure proposed in Suveges & Davison (2010) was corrected for erratas.The maximum likelihood is based on the limiting mixture distribution ofthe intervals between exceedances (an exponential with a point mass at zero).The conditionD^{(K)}(u_n) should be checked by the user.

Fukutome et al. (2015) propose an ad hoc automated procedure

  1. Calculate the interexceedance times for each K-gap and each threshold, along with the number of clusters

  2. Select the (u,K) pairs for which IMT < 0.05 (corresponding to a P-value of 0.82)

  3. Among those, select the pair (u,K) for which the number of clusters is the largest

Value

an invisible list of class with elements

Author(s)

Leo Belzile

References

Fukutome, Liniger and Suveges (2015), Automatic threshold and run parameter selection: a climatology for extreme hourly precipitation in Switzerland.Theoretical and Applied Climatology,120(3), 403-416.

Suveges and Davison (2010), Model misspecification in peaks over threshold analysis.Annals of Applied Statistics,4(1), 203-221.

White (1982), Maximum Likelihood Estimation of Misspecified Models.Econometrica,50(1), 1-25.

Examples

thselect.sdinfo(  xdat = rgp(n = 10000),  qlev = seq(0.1, 0.9, length = 10),  kmax = 3)

Metric-based threshold selection

Description

Adaptation of Varty et al.'s metric-based thresholdautomated diagnostic for the independent and identically distributed case with no rounding.

Usage

thselect.vmetric(  xdat,  thresh,  B = 199L,  type = c("eqd", "exp", "qq", "pp"),  dist = c("l1", "l2"),  uq = FALSE,  neval = 1000L,  level = 0.95,  plot = FALSE,  ...)

Arguments

xdat

vector of observations

thresh

vector of thresholds

B

number of bootstrap replications

type

string indicating scale, eitherexp for exponential quantile-quantile plot as in Varty et al. (2021) orpp for probability-probability plot (uniform). The methodqq or expected quantile discrepancy (eqd) corresponds to Murphy et al. (2024) with the quantiles on the generalized Pareto scale.

dist

string indicating norm, eitherl1 for absolute error orl2 for quadratic error

uq

logical; ifTRUE, generate bootstrap samples accounting for the sampling distribution of parameters

neval

number of points at which to estimate the metric. Default to 1000L

level

level of symmetric confidence interval. Default to 0.95

plot

logical; ifTRUE, returns a plot

...

additional arguments for backward compatibility

Details

The algorithm proceeds by first computing the maximumlikelihood algorithm and then simulating datasets fromreplication with parameters drawn from a bivariate normalapproximation to the maximum likelihood estimator distribution.

For each bootstrap sample, we refit themodel and convert the quantiles toexponential or uniform variates.The mean absolute or mean squared distanceis calculated on these. The thresholdreturned is the one with the lowest valueof the metric.

Value

an invisible list with components

References

Varty, Z. and J.A. Tawn and P.M. Atkinson and S. Bierman (2021+), Inference for extreme earthquake magnitudes accounting for a time-varying measurement process.

Murphy, C., Tawn, J. A., & Varty, Z. (2024).Automated Threshold Selection and Associated Inference Uncertainty for Univariate Extremes. Technometrics, 67(2), 215–224. <doi:10.1080/00401706.2024.2421744>

Examples

xdat <- rexp(1000, rate = 1/2)thresh <- quantile(xdat, prob = c(0.25,0.5, 0.75))thv <- thselect.vmetric(xdat, thresh, B = 99)plot(thv)print(thv)

Wadsworth's sequential analysis threshold selection

Description

Function to produce diagnostic plots and test statistics for thethreshold diagnostics exploiting structure of maximum likelihood estimatorsbased on the non-homogeneous Poisson process likelihood or the coefficient of tail dependence

Usage

thselect.wseq(  xdat,  thresh,  qlev,  model = c("nhpp", "taildep", "rtaildep"),  npp = 1,  nsim = 1000L,  level = 0.95,  plot = FALSE,  ...)

Arguments

xdat

a numeric vector or matrix of data to be fitted.

thresh

vector of candidate thresholds.

qlev

vector of probabilities for empirical quantiles used in place of the threshold, used if argumentthresh is missing.

model

string specifying whether the univariate or multivariate diagnostic should be used. Eithernhpp for the univariate model, orexp (invexp) for the bivariate exponential model with rate (inverse rate) parametrization. See details.

npp

number of observations per period for the non-homogeneous point process model. Default to 1.

nsim

number of Monte Carlo simulations used to assess the null distribution of the test statistic

level

confidence level of intervals, defaults to 0.95

plot

logical; ifTRUE, calls the plot routine

...

additional parameters passed to internal routine

Details

The function is a wrapper for the univariate (non-homogeneous Poisson process model) and exponential dependence model applied to the minimum component (tail dependence coefficient).For the latter, the user can select either the rate ("taildep" or inverse rate parameter ("rtaildep"). The inverse rate parametrization works better for uniformity of the p-value distribution under the likelihood ratio test for the changepoint.

For the coefficient of tail dependence, users must provide pairwise minimum of marginally exponentially distributed margins (see example)

Value

an object of class invisible list with components

Author(s)

Jennifer L. Wadsworth, Léo Belzile

References

Wadsworth, J.L. (2016). Exploiting Structure of Maximum Likelihood Estimators for Extreme Value Threshold Selection,Technometrics,58(1), 116-126,http://dx.doi.org/10.1080/00401706.2014.998345.

Examples

## Not run: set.seed(123)xdat <- abs(rnorm(5000))thresh <- quantile(xdat, seq(0, 0.9, by = 0.1))(diag <- thselect.wseq( xdat = xdat, thresh = thresh, plot = TRUE, type = "ps"))# Multivariate example, with coefficient of tail dependencexbvn <- rmnorm(n = 6000L,                mu = rep(0, 2),                Sigma = cbind(c(1, 0.7), c(0.7, 1)))thselect.wseq(  xdat = xbvn,  qlev = seq(0, 0.9, length.out = 30),  model = 'taildep',  plot = TRUE)## End(Not run)

Coefficient of variation threshold stability plot

Description

This function calculates parametric estimates of the coefficient of variation with pointwise Wald confidence intervals along with empirical estimates and returns a threshold stability plot.

Usage

tstab.cv(  xdat,  thresh,  method = c("empirical", "mle", "wcv", "cv"),  nthresh = 10L,  nsim = 99L,  plot = TRUE,  level = 0.95,  ...)

Arguments

xdat

[vector] vector of observations

thresh

[vector] vector of threshold. If missing, set top^k fork=0 tok=nthresh

method

[string], either moment estimator for the (weighted) coefficient of variation (wcv andcv) or maximum likelihood (mle)

nthresh

[integer] number of thresholds, ifthresh is not supplied by the user

nsim

[integer] number of bootstrap replications

plot

[logical] ifTRUE, returns a plot of the p-value path

level

[numeric] probability level for sequential testing procedure

...

additional parameters, notably for packageboot, for thetype of confidence intervals.

Examples

tstab.cv(   xdat = rgp(1000),   thresh = qgp(seq(0,0.9, by = 0.1)),   method = "cv")tstab.cv(   xdat = rgp(1000),   thresh = qgp(seq(0,0.9, by = 0.1)),   method = "empirical")

Threshold stability plots for extended generalized Pareto models

Description

Threshold stability plots for extended generalized Pareto models

Usage

tstab.egp(  xdat,  thresh,  model = c("pt-beta", "pt-gamma", "pt-power", "gj-tnorm", "gj-beta", "exptilt",    "logist"),  param = c("shape", "kappa"),  type = c("wald", "lrt"),  transform = FALSE,  level = 0.95,  plot = TRUE,  ...)

Arguments

xdat

vector of observations, greater than the threshold

thresh

threshold value

model

a string indicating which extended family to fit

param

[string] parameter, eithershape or additional parameterkappa

type

[string] confidence interval type, eitherwald orprofile.

transform

logical; ifTRUE andtype="wald", intervals forkappa are computed on the log-scale and back-transformed.

level

[double] confidence interval level, default to 0.95.

plot

[logical] ifTRUE (default), return a threshold stability plot

...

additional arguments for the plot function, currently ignored.

Value

an invisible list object of classmev_egp_tstab with elements

Examples

xdat <- rgp(n = 1000)tstab.egp( xdat = xdat, thresh = c(0, quantile(xdat, 0.5)), model = "gj-tnorm", param = "kappa", transform = TRUE)

Parameter stability plots for peaks-over-threshold

Description

This function computes the maximum likelihood estimateat each provided threshold and plots the estimates (pointwise),along with 95% confidence/credible intervals obtained using Wald or profile confidence intervals,or else from 1000 independent draws from the posterior distribution undervague independent normal prior on the log-scale and shape.The latter two methods better reflect the asymmetry of the estimates than the Wald confidence intervals.

Usage

tstab.gpd(  xdat,  thresh,  method = c("wald", "lrt", "post"),  level = 0.95,  plot = TRUE,  which = c("scale", "shape"),  changepar = TRUE,  ...)

Arguments

xdat

a vector of observations

thresh

a vector of candidate thresholds at which to compute the estimates.

method

string indicating the method for computing confidence or credible intervals.Must be one of"wald","profile" or"post".

level

confidence level of the intervals. Default to 0.95.

plot

logical; should parameter stability plots be displayed? Default toTRUE.

which

character vector with elementsscale orshape

changepar

logical; ifTRUE, changes the graphical parameters.

...

additional arguments passed toplot.

Value

a list with components

plots of the modified scale and shape parameters, with pointwise confidence/credible intervalsand an invisible data frame containing the thresholdthresh and the modified scale and shape parameters.

Note

The function is hard coded to prevent fitting a generalized Pareto distribution to samples of size less than 10. If the estimated shape parameters are all on the boundary of the parameter space (meaning\hat{\xi}=-1), then the plots return one-sided confidence intervals for both the modified scale and shape parameters: these typically suggest that the chosen thresholds are too high for estimation to be reliable.

Author(s)

Leo Belzile

See Also

gpd.fitrange

Examples

dat <- abs(rnorm(10000))u <- qnorm(seq(0.9,0.99, by= 0.01))par(mfrow = c(1,2))tstab.gpd(xdat = dat, thresh = u, changepar = FALSE)## Not run: tstab.gpd(xdat = dat, thresh = u, method = "lrt")tstab.gpd(xdat = dat, thresh = u, method = "post")## End(Not run)

Threshold stability plot for Hill estimator

Description

Threshold stability plot for Hill estimator

Usage

tstab.hill(xdat, kmax, method = "hill", ..., log = TRUE)

Arguments

xdat

[vector] sample exceedances

kmax

[int] maximum number of order statistics

method

[string] name of estimator for shape parameter. Default tohill.

...

additional arguments passed tofit.shape for certain methods.

log

[logical] should the x-axis for the number of order statistics used for estimation be displayed on the log scale? Default toTRUE

Value

a plot of shape estimates as a function of the number of exceedances

Examples

xdat <- rgp(n = 250, loc = 1, scale = 2, shape = 0.5)tstab.hill(xdat)

Threshold stability plots for left-truncated Hill estimators

Description

Given a vector of exceedances and some potential choices ofk for the threshold, compute the left-truncated Hill estimators for each value ofk and use these to compute the variance and slope of the estimator

Usage

tstab.lthill(xdat, k, which = c("lthill", "var", "slope"), log = TRUE, ...)

Arguments

xdat

[numeric] vector of positive observations

k

[integer] number of order statistics for the threshold

which

[string] the type of plot, showing the left-truncated Hill plot on the log, the log of the variance of the estimator, or the log slope

log

[logical] ifTRUE (default), shows the Hill plot on the log-scale

...

additional parameters for color, etc. to be passed to plot

Value

an invisible list with lthill, order statistics, the log variance and the log scale.

References

Bladt, M., Albrecher, H. & Beirlant, J. (2020)Threshold selection and trimming in extremes. Extremes, 23, 629-665 .doi:10.1007/s10687-020-00385-0

Examples

xdat <- 10/(1 - runif(n = 1000)) - 10tstab.lthill(xdat = xdat, k = c(50,100,200))

Mean residual life plot

Description

Computes mean of sample exceedances over a range of thresholds or for a pre-specified number of largest order statistics, and returns a plot with 95% Wald-based confidence intervals as a function of either the threshold or the number of exceedances. The main purpose is the plotting method, which generates the so-called mean residual life plot. The latter should be approximately linear over the threshold for a generalized Pareto distribution

Usage

tstab.mrl(  xdat,  thresh,  kmin = 10L,  kmax = length(xdat),  plot = TRUE,  level = 0.95,  xlab = c("thresh", "nexc"),  type = c("band", "ptwise"),  ...)

Arguments

xdat

vector of sample observations

thresh

vector of thresholds

kmin

integer giving the minimum number of exceedances; ignored ifthresh is provided. Default to 10

kmax

integer giving the maximum number of exceedances; ignored ifthresh is provided. Default to sample size.

plot

logical; ifTRUE, call the plot method

level

double giving the level of confidence intervals for the plot, default to 0.95

xlab

string indicating whether to use thresholds (thresh) or number of largest order statistics (nexc) for the x-axis

type

string whether to plot pointwise confidence intervals using segments ("ptwise") or using dashed lines ("band")

...

additional arguments, currently ignored

Value

an invisible list with mean sample exceedances and standard deviation, number of exceedances, threshold

References

Davison, A.C. and R.L. Smith (1990). Models for Exceedances over High Thresholds (with discussion),Journal of the Royal Statistical Society. Series B (Methodological),52(3), 393–442.

Examples

tstab.mrl( xdat = rgp(n = 100, shape = -0.5), xlab = "thresh", kmax = 50)tstab.mrl( rexp(100), thresh = qexp(seq(0, 0.9, by = 0.01)))

Venice Sea Levels

Description

Thevenice data contains the 10 largest yearly sea levels (in cm)from 1887 until 2019. Only the yearly maximum is available for 1922and the six largest observations for 1936.

Format

a data frame with 133 rows and 11 columns containing the year of the measurement (first column)and ordered 10-largest yearly observations, reported in decreasing order from largest (r1) to smallest (r10).

Note

Smith (1986) notes that the annual maxima seems to fluctuate around a constant sea levelup to 1930 or so, after which there is potential linear trend. Records of threshold exceedances above80 cm (reported on the website) indicate that observations are temporally clustered.

The observations from 1931 until 1981 can be found inTable 1 in Smith (1986), who reported data from Pirazzoli (1982).The values from 1983 until 2019 were extracted by Anthony Davison from the Cityof Venice website (accessed in May 2020) and are licensed under the CC BY-NC-SA 3.0 license.The Venice City website indicatesthat later measurements were recorded by an instrument located in Punta Salute.

Source

City of Venice, Historical archive <https://www.comune.venezia.it/node/6214>. Last accessed November 5th, 2020.

References

Smith, R. L. (1986) Extreme value theory based on therlargest annual events.Journal of Hydrology86, 27–43.

Pirazzoli, P., 1982. Maree estreme a Venezia (periodo 1872-1981).Acqua Aria10, 1023-1039.

Coles, S. G. (2001)An Introduction to Statistical Modelling of Extreme Values. London: Springer.

See Also

venice


Best 200 times of Women 1500m Track

Description

200 all-time best performance (in seconds) of women 1500-meter run.

Format

a vector of size 200

Source

<http://www.alltime-athletics.com/w_1500ok.htm>, accessed 14.08.2018


Coefficient of extremal asymmetry

Description

This function implements estimators of the bivariatecoefficient of extremal asymmetry proposed inSemadeni's (2021) PhD thesis.Two estimators are implemented: one based on empirical distributions, the second using empirical likelihood.

Usage

xasym(  xdat,  qlev = NULL,  nq = 40,  qlim = c(0.8, 0.99),  estimator = c("emp", "elik"),  confint = c("none", "wald", "bootstrap"),  level = 0.95,  B = 999L,  ties.method = "random",  plot = TRUE,  ...)

Arguments

xdat

ann by 2 matrix of observations

qlev

vector of quantile levels at which to evaluate extremal asymmetry

nq

integer; number of quantiles at which to evaluate the coefficient ifu isNULL

qlim

a vector of length 2 with the probability limits for the quantiles

estimator

string indicating the estimation method, one ofempirical or empirical likelihood (emplik)

confint

string for the method used to derive confidence intervals, eithernone (default) or a nonparametricbootstrap

level

probability level for confidence intervals, default to 0.95 or bounds for the interval

B

integer; number of bootstrap replicates (if applicable)

ties.method

string; method for handling ties. See the documentation ofrank for available options.

plot

logical; ifTRUE, return a plot.

...

additional parameters for plots

Details

LetU,V be uniform random variables and define the partial extremal dependence coefficients

\varphi_{+}(u) = \Pr(V > U \mid U > u, V > u),

,

\varphi_{-}(u) = \Pr(V < U \mid U > u, V > u),

\varphi_0(u) = \Pr(V = U \mid U > u, V > u).

Define

\varphi(u) = \frac{\varphi_{+} - \varphi_{-}}{\varphi_{+} + \varphi_{-}}

and the coefficient of extremal asymmetry as\varphi = \lim_{u \to 1} \varphi(u).

The empirical likelihood estimator, derived for max-stable vectors with unit Frechet margins, is

\widehat{\varphi}_{\mathrm{el}} = \frac{\sum_i p_i \mathrm{I}(w_i \leq 0.5) - 0.5}{0.5 - 2\sum_i p_i(0.5-w_i) \mathrm{I}(w_i \leq 0.5)}

wherep_i is the empirical likelihood weight for observationi,\mathrm{I} is an indicator function andw_i is the pseudo-angle associated to the first coordinate, derived based on exceedances aboveu.

Value

an invisible data frame with columns

threshold

vector of thresholds on the probability scale

coef

extremal asymmetry coefficient estimates

confint

eitherNULL or a matrix with two columns containing the lower and upper bounds for each threshold

References

Semadeni, C. (2020). Inference on the Angular Distribution of Extremes, PhD thesis, EPFL, no. 8168.

Examples

## Not run: samp <- rmev(n = 1000,             d = 2,             param = 0.2,             model = "log")xasym(samp, confint = "wald")xasym(samp, method = "emplik")## End(Not run)

Coefficient of extremal asymmetry

Description

This function implements estimators of the bivariatecoefficient of extremal asymmetry proposed in Semadeni's (2021) PhD thesis.Two estimators are implemented: one based on empirical distributions, the second using empirical likelihood.

Usage

xdep.asym(  xdat,  qlev = NULL,  nq = 40,  qlim = c(0.8, 0.99),  estimator = c("emp", "elik"),  confint = c("none", "wald", "bootstrap"),  level = 0.95,  B = 999L,  ties.method = "random",  plot = TRUE,  ...)

Arguments

xdat

ann by 2 matrix of observations

qlev

vector of quantile levels at which to evaluate extremal asymmetry

nq

integer; number of quantiles at which to evaluate the coefficient ifu isNULL

qlim

a vector of length 2 with the probability limits for the quantiles

estimator

string indicating the estimation method, one ofemp or empirical likelihood (elik)

confint

string for the method used to derive confidence intervals, eithernone (default) or a nonparametricbootstrap

level

probability level for confidence intervals, default to 0.95 or bounds for the interval

B

integer; number of bootstrap replicates (if applicable)

ties.method

string; method for handling ties. See the documentation ofrank for available options.

plot

logical; ifTRUE, return a plot.

...

additional arguments for backward compatibility

Details

LetU,V be uniform random variables and define the partial extremal dependence coefficients

\varphi_{+}(u) = \Pr(V > U \mid U > u, V > u),

,

\varphi_{-}(u) = \Pr(V < U \mid U > u, V > u),

\varphi_0(u) = \Pr(V = U \mid U > u, V > u).

Define

\varphi(u) = \frac{\varphi_{+} - \varphi_{-}}{\varphi_{+} + \varphi_{-}}

and the coefficient of extremal asymmetry as\varphi = \lim_{u \to 1} \varphi(u).

The empirical likelihood estimator, derived for max-stable vectors with unit Frechet margins, is

\widehat{\varphi}_{\mathrm{el}} = \frac{\sum_i p_i \mathrm{I}(w_i \leq 0.5) - 0.5}{0.5 - 2\sum_i p_i(0.5-w_i) \mathrm{I}(w_i \leq 0.5)}

wherep_i is the empirical likelihood weight for observationi,\mathrm{I} is an indicator function andw_i is the pseudo-angle associated to the first coordinate, derived based on exceedances aboveu.

Value

an invisible data frame with columns

qlev

quantile level of thresholds

coef

extremal asymmetry coefficient estimates

lower

eitherNULL or a vector containing the lower bound of the confidence interval

upper

eitherNULL or a vector containing the lower bound of the confidence interval

References

Semadeni, C. (2020). Inference on the Angular Distribution of Extremes, PhD thesis, EPFL, no. 8168.

Examples

## Not run: samp <- rmev(n = 1000,             d = 2,             param = 0.2,             model = "log")xdep.asym(samp, confint = "wald")xdep.asym(samp, method = "emplik", confint = "none")## End(Not run)

Coefficient of tail correlation

Description

The coefficient of tail correlation\chi is

\chi = \lim_{u \to 1} \frac{\Pr(F_1(X_1)>u, \ldots, F_D(X_D)>u)}{1-u}.

Asymptotically independent vectors have\chi = 0. The estimator uses an estimator of the survival copula

Usage

xdep.chi(  xdat,  qlev = NULL,  nq = 40,  qlim = c(0.8, 0.99),  estimator = c("emp", "betacop", "gpd", "hill"),  confint = c("wald", "lrt"),  level = 0.95,  margtrans = c("emp", "none"),  ties.method = "random",  plot = TRUE,  ...)

Arguments

xdat

ann byd matrix of multivariate observations

qlev

vector of percentiles between 0 and 1

nq

number of quantiles of the structural variable at which to form a grid; only used ifu = NULL.

qlim

limits for the sequenceu of the structural variable

estimator

string giving estimator to employ

confint

string indicating the type of confidence interval, one of"wald" or"lrt"

level

the confidence level required (default to 0.95).

margtrans

string giving the marginal transformation, one ofemp for rank-based transformation ornone if data are already on the uniform scale

ties.method

string indicating the type of method forrank; seerank for a list of options. Default to"random"

plot

logical; ifTRUE, return a plot

...

additional arguments totaildep, currently ignored

Value

a data frame

Examples

## Not run: set.seed(765)# Max-stable modeldat <- rmev(n = 1000, d = 2, param = 0.7, model = "log")xdep.chi(dat, confint = 'wald')## End(Not run)

Coefficient chi-bar

Description

For data with unit Pareto margins, the coefficient\bar{\chi} = 2\eta-1 is defined via

\Pr(\min(X) > x) = L(x)x^{-1/\eta},

whereL(x) is a slowly varying function. Ignoring the latter, several estimators of\eta can be defined. In unit Pareto margins,\eta is a nonnegative shape parameter that can be estimated by fitting a generalized Pareto distribution above a high threshold. In exponential margins,\eta is a scale parameter and the maximum likelihood estimator of the latter is the Hill estimator. Both methods are based on peaks-over-threshold and the user can choose between pointwise confidence obtained through a likelihood ratio test statistic ("lrt") or the Wald statistic ("wald").

Usage

xdep.chibar(  xdat,  qlev = NULL,  nq = 40,  qlim = c(0.8, 0.99),  estimator = c("emp", "betacop"),  confint = c("wald", "lrt"),  level = 0.95,  margtrans = c("emp", "none"),  ties.method = "random",  plot = TRUE,  ...)

Arguments

xdat

ann byd matrix of multivariate observations

qlev

vector of percentiles between 0 and 1

nq

number of quantiles of the structural variable at which to form a grid; only used ifu = NULL.

qlim

limits for the sequenceu of the structural variable

estimator

string giving estimator to employ

confint

string indicating the type of confidence interval, one of"wald" or"lrt"

level

the confidence level required (default to 0.95).

margtrans

string giving the marginal transformation, one ofemp for rank-based transformation ornone if data are already on the uniform scale

ties.method

string indicating the type of method forrank; seerank for a list of options. Default to"random"

plot

logical; ifTRUE, return a plot

...

additional arguments totaildep, currently ignored

Details

The most common approach for estimation is the empirical survival copula, by evaluating the proportion of sample minima with uniform margins that exceed a givenx. An alternative estimator uses a smoothed estimator of the survival copula using Bernstein polynomial, resulting in the so-calledbetacop estimator. Approximate pointwise confidence intervals for the latter are obtained by assuming the proportion of points is binomial.

Value

a data frame

References

Ledford, A.W. and J. A. Tawn (1996), Statistics for near independence in multivariate extreme values.Biometrika,83(1), 169–187.

Ledford, A.W. and J. A. Tawn (1996), Statistics for near independence in multivariate extreme values.Biometrika,83(1), 169–187.

Examples

## Not run: set.seed(765)# Max-stable modeldat <- rmev(n = 1000, d = 2, param = 0.7, model = "log")xdep.chibar(dat, confint = 'wald')## End(Not run)

Coefficient of tail dependence

Description

For data with unit Pareto margins, the coefficient of tail dependence\eta is defined via

\Pr(\min(X) > x) = L(x)x^{-1/\eta},

whereL(x) is a slowly varying function. Ignoring the latter, several estimators of\eta can be defined. In unit Pareto margins,\eta is a nonnegative shape parameter that can be estimated by fitting a generalized Pareto distribution above a high threshold. In exponential margins,\eta is a scale parameter and the maximum likelihood estimator of the latter is the Hill estimator. Both methods are based on peaks-over-threshold and the user can choose between pointwise confidence obtained through a likelihood ratio test statistic ("lrt") or the Wald statistic ("wald").

Usage

xdep.eta(  xdat,  qlev = NULL,  nq = 40,  qlim = c(0.8, 0.99),  estimator = c("emp", "betacop", "gpd", "hill", "kj"),  confint = c("wald", "lrt"),  level = 0.95,  margtrans = c("emp", "sp", "none"),  ties.method = "random",  plot = TRUE,  mqlev = NULL,  ...)

Arguments

xdat

ann byd matrix of multivariate observations

qlev

vector of percentiles between 0 and 1

nq

number of quantiles of the structural variable at which to form a grid; only used ifu = NULL.

qlim

limits for the sequenceu of the structural variable

estimator

string giving estimator to employ

confint

string indicating the type of confidence interval, one of"wald" or"lrt"

level

the confidence level required (default to 0.95).

margtrans

string giving the marginal transformation, one ofemp for rank-based transformation ornone if data are already on the uniform scale

ties.method

string indicating the type of method forrank; seerank for a list of options. Default to"random"

plot

logical; ifTRUE, return a plot

mqlev

marginal quantile levels for semiparametric estimation for estimatorkj; data above this are modelled using a generalized Pareto distribution. If missing, empirical estimation is used throughout

...

additional arguments totaildep, currently ignored

Details

The most common approach for estimation is the empirical survival copula, by evaluating the proportion of sample minima with uniform margins that exceed a givenx. An alternative estimator uses a smoothed estimator of the survival copula using Bernstein polynomial, resulting in the so-calledbetacop estimator. Approximate pointwise confidence intervals for the latter are obtained by assuming the proportion of points is binomial.

Value

a data frame

References

Ledford, A.W. and J. A. Tawn (1996), Statistics for near independence in multivariate extreme values.Biometrika,83(1), 169–187.

Examples

## Not run: set.seed(765)# Max-stable modeldat <- rmev(n = 1000, d = 2, param = 0.7, model = "log")xdep.eta(dat, confint = 'wald')## End(Not run)

Extremal index estimators

Description

The function implements estimators of the extremal index based on interexceedance time and gap of exceedances. The maximum likelihood estimator and iteratively reweighted leastsquare estimators of Suveges (2007) as well as the intervals estimator. The implementationdiffers from the presentation of the paper in that an iteration limit is enforced to make surethe iterative procedure terminates. Multiple thresholds can be supplied.

Usage

xdep.xindex(  xdat,  qlev = 0.95,  estimator = c("wls", "mle", "intervals"),  confint = c("none", "wald", "lrt"),  level = 0.95,  plot = FALSE,  warn = FALSE,  ...)

Arguments

xdat

numeric vector of observations

qlev

a vector of quantile levels in (0,1) for estimation of the extremal index. Defaults to 0.95

estimator

a string specifying the chosen method (only one allowed). Must be eitherwlsfor weighted least squares,mle for maximum likelihood estimation orintervalsfor the intervals estimator of Ferro and Segers (2003). Partial match is allowed.

confint

string indicating the type of confidence interval, one of"wald" or"lrt" forestimator="mle", else"none"

level

the confidence level required (default to 0.95).

plot

logical; ifTRUE, plot the extremal index as a function ofq

warn

logical; ifTRUE, receive a warning when the sample size is too small

...

additional arguments, for backward compatibility

Details

The iteratively reweighted least square is a procedure based on the gaps of exceedancesS_n=T_n-1The model is first fitted to non-zero gaps, which are rescaled to have unit exponential scale. The slopebetween the theoretical quantiles and the normalized gap of exceedances isb=1/\theta,with intercepta=\log(\theta)/\theta.As such, the estimate of the extremal index is based on\hat{\theta}=\exp(\hat{a}/\hat{b}).The weights are chosen in such a way as to reduce the influence of the smallest values.The estimator exploits the dual role of\theta as the parameter of the mean forthe interexceedance time as well as the mixture proportion for the non-zero component.

The maximum likelihood is based on an independence likelihood for the rescaled gap of exceedances,namely\bar{F}(u_n)S(u_n). The score equation is equivalent to a quadratic equation in\theta and the maximum likelihood estimate is available in closed form.Its validity requires however conditionD^{(2)}(u_n) to apply;this should be checked by the user beforehand.

A warning is emitted if the effective sample size is less than 50 observations.

Value

a data frame

Author(s)

Leo Belzile

References

Ferro and Segers (2003). Inference for clusters of extreme values,JRSS: Series B,65(2), 545-556.

Suveges (2007) Likelihood estimation of the extremal index.Extremes,10(1), 41-55.

Suveges and Davison (2010), Model misspecification in peaks over threshold analysis.Annals of Applied Statistics,4(1), 203-221.

Fukutome, Liniger and Suveges (2015), Automatic threshold and run parameter selection: a climatologyfor extreme hourly precipitation in Switzerland.Theoretical and Applied Climatology,120(3), 403-416.

Examples

set.seed(234)# Moving maxima model with theta=0.5a <- 1; theta <-  1/(1+a)sim <- rgev(10001, loc=1/(1+a),scale=1/(1+a),shape=1)x <- pmax(sim[-length(sim)]*a,sim[-1])q <- seq(0.9, 0.99, by = 0.01)xdep.xindex(  xdat = x,  qlev = q,  estimator = "mle",  confint = "wald")

[8]ページ先頭

©2009-2025 Movatter.jp