Movatterモバイル変換


[0]ホーム

URL:


Title:Reconstruct a Distribution from a Collection of Quantiles
Version:1.0.4
Description:Given a set of predictive quantiles from a distribution, estimate the distribution and create 'd', 'p', 'q', and 'r' functions to evaluate its density function, distribution function, and quantile function, and generate random samples. On the interior of the provided quantiles, an interpolation method such as a monotonic cubic spline is used; the tails are approximated by a location-scale family.
License:GPL (≥ 3)
URL:http://reichlab.io/distfromq/
Imports:checkmate, purrr, splines, stats, utils, zeallot
Suggests:dplyr, ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0)
VignetteBuilder:knitr
Config/testthat/edition:3
Encoding:UTF-8
RoxygenNote:7.3.1
NeedsCompilation:no
Packaged:2024-09-10 21:02:16 UTC; lshandross
Author:Evan Ray [aut, cre], Aaron Gerding [aut], Li Shandross [ctb], Nick Reich [ctb]
Maintainer:Evan Ray <elray@umass.edu>
Repository:CRAN
Date/Publication:2024-09-13 18:00:06 UTC

Identify duplicated values in a sorted numeric vector, where comparison isup to a specified numeric tolerance. If there is a run of values where eachconsecutive pair is closer together than the tolerance, all are labeled asduplicates even if not all values in the run are within the tolerance.

Description

Identify duplicated values in a sorted numeric vector, where comparison isup to a specified numeric tolerance. If there is a run of values where eachconsecutive pair is closer together than the tolerance, all are labeled asduplicates even if not all values in the run are within the tolerance.

Usage

duplicated_tol(x, tol = 1e-06, incl_first = FALSE)

Arguments

x

a numeric vector in which to identify duplicates

tol

numeric tolerance for identifying duplicates

incl_first

boolean indicator of whether or not the first entry in arun of duplicates should be indicated as a duplicate.FALSE mirrors thebehavior of the base R functionduplicated.

Value

a boolean vector of the same length asx


Get indices of starts and ends of runs of duplicate values

Description

Get indices of starts and ends of runs of duplicate values

Usage

get_dup_run_inds(dups)

Arguments

dups

a boolean vector that would result from callingduplicated_tol(..., incl_first = FALSE)

Value

named list with entriesstarts giving indices of the first elementin each sequence of runs of duplicate values andends giving indices ofthe last element in each sequence of runs of duplicate values.


Creates a function that evaluates the probability density function of anapproximation to a distribution obtained by interpolating and extrapolatingfrom a set of quantiles of the distribution.

Description

Creates a function that evaluates the probability density function of anapproximation to a distribution obtained by interpolating and extrapolatingfrom a set of quantiles of the distribution.

Usage

make_d_fn(  ps,  qs,  interior_method = "spline_cdf",  interior_args = list(),  tail_dist = "norm",  dup_tol = 1e-06,  zero_tol = 1e-12)

Arguments

ps

vector of probability levels

qs

vector of quantile values corresponding to ps

interior_method

method for interpolating the distribution on theinterior of the providedqs. This package provides one method for this,"spline_cdf". The user may also provide a custom function; see thedetails for more.

interior_args

an optional named list of arguments that are passedon to theinterior_method

tail_dist

name of parametric distribution for the tails

dup_tol

numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance.

zero_tol

numeric tolerance for identifying values inqs that are(approximately) zero.

Details

The defaultinterior_method,"spline_cdf", represents thedistribution as a sum of a discrete component at any points where thereare duplicatedqs for multiple differentps and a continuous componentthat is estimated by using a monotonic cubic spline that interpolates theprovided⁠(q, p)⁠ pairs as an estimate of the CDF. The density function isthen obtained by differentiating this estimate of the CDF.

Optionally, the user may provide another function that accepts argumentsps,qs,tail_dist, andfn_type (which will be either"d","p",or"q"), and optionally additional named arguments to be specified viainterior_args. This function should return a function with argumentsx,log that evaluates the pdf or its logarithm.

Value

a function with argumentsx andlog that can be used to evaluatethe approximate density function (or itslog) at the pointsx.


Creates a function that evaluates the cumulative distribution function of anapproximation to a distribution obtained by interpolating and extrapolatingfrom a set of quantiles of the distribution.

Description

Creates a function that evaluates the cumulative distribution function of anapproximation to a distribution obtained by interpolating and extrapolatingfrom a set of quantiles of the distribution.

Usage

make_p_fn(  ps,  qs,  interior_method = "spline_cdf",  interior_args = list(),  tail_dist = "norm",  dup_tol = 1e-06,  zero_tol = 1e-12)

Arguments

ps

vector of probability levels

qs

vector of quantile values corresponding to ps

interior_method

method for interpolating the distribution on theinterior of the providedqs. This package provides one method for this,"spline_cdf". The user may also provide a custom function; see thedetails for more.

interior_args

an optional named list of arguments that are passedon to theinterior_method

tail_dist

name of parametric distribution for the tails

dup_tol

numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance.

zero_tol

numeric tolerance for identifying values inqs that are(approximately) zero.

Details

The defaultinterior_method,"spline_cdf", represents thedistribution as a sum of a discrete component at any points where thereare duplicatedqs for multiple differentps and a continuous componentthat is estimated by using a monotonic cubic spline that interpolates theprovided⁠(q, p)⁠ pairs as an estimate of the CDF.

Optionally, the user may provide another function that accepts argumentsps,qs,tail_dist, andfn_type (which will be either"d","p",or"q"), and optionally additional named arguments to be specified viainterior_args. This function should return a function with argumentsx,log that evaluates the pdf or its logarithm.

Value

a function with argumentsq andlog.p that can be used toevaluate the approximate cumulative distribution function (or itslog)at the pointsq.


Creates a function that evaluates the quantile function of an approximationto a distribution obtained by interpolating and extrapolating from a set ofquantiles of the distribution.

Description

Creates a function that evaluates the quantile function of an approximationto a distribution obtained by interpolating and extrapolating from a set ofquantiles of the distribution.

Usage

make_q_fn(  ps,  qs,  interior_method = "spline_cdf",  interior_args = list(),  tail_dist = "norm",  dup_tol = 1e-06,  zero_tol = 1e-12)

Arguments

ps

vector of probability levels

qs

vector of quantile values corresponding to ps

interior_method

method for interpolating the distribution on theinterior of the providedqs. This package provides one method for this,"spline_cdf". The user may also provide a custom function; see thedetails for more.

interior_args

an optional named list of arguments that are passedon to theinterior_method

tail_dist

name of parametric distribution for the tails

dup_tol

numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance.

zero_tol

numeric tolerance for identifying values inqs that are(approximately) zero.

Details

The defaultinterior_method,"spline_cdf", represents thedistribution as a sum of a discrete component at any points where thereare duplicatedqs for multiple differentps and a continuous componentthat is estimated by using a monotonic cubic spline that interpolates theprovided⁠(q, p)⁠ pairs as an estimate of the CDF. The quantile functionis then obtained by inverting this estimate of the CDF.

Optionally, the user may provide another function that accepts argumentsps,qs,tail_dist, andfn_type (which will be either"d","p",or"q"), and optionally additional named arguments to be specified viainterior_args. This function should return a function with argumentpthat evaluates the quantile function.

Value

a function with argumentp that can be used to calculate quantilesof the approximated distribution at the probability levelsp.


Creates a function that generates random deviates from an approximationto a distribution obtained by interpolating and extrapolating from a set ofquantiles of the distribution.

Description

Creates a function that generates random deviates from an approximationto a distribution obtained by interpolating and extrapolating from a set ofquantiles of the distribution.

Usage

make_r_fn(  ps,  qs,  interior_method = "spline_cdf",  interior_args = list(),  tail_dist = "norm",  dup_tol = 1e-06,  zero_tol = 1e-12)

Arguments

ps

vector of probability levels

qs

vector of quantile values corresponding to ps

interior_method

method for interpolating the distribution on theinterior of the providedqs. This package provides one method for this,"spline_cdf". The user may also provide a custom function; see thedetails for more.

interior_args

an optional named list of arguments that are passedon to theinterior_method

tail_dist

name of parametric distribution for the tails

dup_tol

numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance.

zero_tol

numeric tolerance for identifying values inqs that are(approximately) zero.

Details

The defaultinterior_method,"spline_cdf", represents thedistribution as a sum of a discrete component at any points where thereare duplicatedqs for multiple differentps and a continuous componentthat is estimated by using a monotonic cubic spline that interpolates theprovided⁠(q, p)⁠ pairs as an estimate of the CDF. The quantile functionis then obtained by inverting this estimate of the CDF.

Optionally, the user may provide another function that accepts argumentsps,qs,tail_dist, andfn_type (which will be either"d","p",or"q"), and optionally additional named arguments to be specified viainterior_args. This function should return a function with argumentpthat evaluates the quantile function.

Value

a function with argumentn that can be used to generate a sample ofsizen from the approximated distribution.


Create a polySpline object representing a monotonic Hermite splineinterpolating a given set of points.

Description

Create a polySpline object representing a monotonic Hermite splineinterpolating a given set of points.

Usage

mono_Hermite_spline(x, y, m)

Arguments

x

vector giving the x coordinates of the points to be interpolated.

y

vector giving the y coordinates of the points to be interpolated.Must be increasing or decreasing for 'method = "hyman"'.

m

(for 'splinefunH()') vector ofslopesm_i at thepoints(x_i,y_i); these together determine theHermite “spline” which is piecewise cubic, (only)once differentiablecontinuously.

Details

This function essentially reproducesstats::splinefunH, but itreturns a polynomial spline object as used in thesplines package ratherthan a function that evaluates the spline, and potentially makesadjustments to the input slopesm to enforce monotonicity.

Value

An object of classpolySpline with the spline object, suitable foruse with other functionality from thesplines package.


Approximate density function, CDF, or quantile function on the interior ofprovided quantiles by representing the distribution as a sum of a discretepart at any duplicatedqs and a continuous part for which the CDF isestimated using a monotonic Hermite spline. See details for more.

Description

Approximate density function, CDF, or quantile function on the interior ofprovided quantiles by representing the distribution as a sum of a discretepart at any duplicatedqs and a continuous part for which the CDF isestimated using a monotonic Hermite spline. See details for more.

Usage

spline_cdf(ps, qs, tail_dist, fn_type = c("d", "p", "q"), n_grid = 20)

Arguments

ps

vector of probability levels

qs

vector of quantile values corresponding to ps

tail_dist

name of parametric distribution for the tails

fn_type

the type of function that is requested:"d" for a PDF,"p" for a CDF, or"q" for a quantile function.

n_grid

grid size to use when augmenting the inputqs to obtain afiner grid of points along which we form a piecewise linear approximationto the spline.n_grid evenly-spaced points are inserted between eachpair of consecutive values inqs. The default value is 20. This canbe set toNULL, in which case the piecewise linear approximation is notused. This is not recommended if thefn_type is"q".

Details

The CDF of the continuous part of the distribution is estimatedusing a monotonic degree 3 Hermite spline that interpolates the quantilesafter subtracting the discrete distribution and renormalizing. In theory,an estimate of the quantile function could be obtained by directly invertingthis spline. However, in practice, we have observed that this can suffer fromnumerical problems. Therefore, the default behavior of this function is toevaluate the "stage 1" CDF estimate corresponding to discrete point massesplus monotonic spline at a fine grid of points, and use the "stage 2" CDFestimate that linearly interpolates these points with steps at any duplicatedq values. The quantile function estimate is obtained by inverting this"stage 2" CDF estimate. When the distribution is continuous, we can obtain anestimate of the PDF by differentiating the CDF estimate, resulting in adiscontinuous "histogram density". The size of the grid can be specified withthen_grid argument. In settings where it is desirable to obtain acontinuous density function, the "stage 1" CDF estimate can be used bysettingn_grid = NULL.

Value

a function to evaluate the PDF, CDF, or quantile function.


Split ps and qs into those corresponding to discrete and continuousparts of a distribution.

Description

Split ps and qs into those corresponding to discrete and continuousparts of a distribution.

Usage

split_disc_cont_ps_qs(  ps,  qs,  dup_tol = 1e-06,  zero_tol = 1e-12,  is_hurdle = FALSE)

Arguments

ps

vector of probability levels

qs

vector of quantile values corresponding to ps

dup_tol

numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance.

zero_tol

numeric tolerance for identifying values inqs that are(approximately) zero.

is_hurdle

boolean indicating whether or not this is a hurdle model.If so, qs of zero always indicate the presence of a point mass at 0.In this case, 0 is not included among the returnedcont_qs. Setting thisargument toTRUE is primarily appropriate when we are working with adistributional family that is bounded above 0 (and may have density 0 at 0)such as a lognormal.

Value

named list with the following entries:


A factory that returns a function that performs linear interpolation,allowing for "steps" or discontinuities.

Description

A factory that returns a function that performs linear interpolation,allowing for "steps" or discontinuities.

Usage

step_interp_factory(x, y, cont_dir = c("right", "left"), increasing = TRUE)

Arguments

x

numeric vector with the "horizontal axis" coordinates of the pointsto interpolate.

y

numeric vector with the "vertical axis" coordinates of the pointsto interpolate.

cont_dir

at steps or discontinuities, the direction from which thefunction is continuous. This will be "right" for a CDF or "left" for a QF.

increasing

boolean indicating whether the function is increasing ordecreasing. Only used in the degenerate case where there is only one uniquevalue ofx.

Value

a function with argumentx that performs linear approximation ofthe input data points.


Get unique values in a sorted numeric vector, where comparison is up to aspecified numeric tolerance. If there is a run of values where eachconsecutive pair is closer together than the tolerance, all are labeled ascorresponding to a single unique value even if not all values in the run arewithin the tolerance.

Description

Get unique values in a sorted numeric vector, where comparison is up to aspecified numeric tolerance. If there is a run of values where eachconsecutive pair is closer together than the tolerance, all are labeled ascorresponding to a single unique value even if not all values in the run arewithin the tolerance.

Usage

unique_tol(x, tol = 1e-06, ties = mean)

Arguments

x

a numeric vector in which to identify duplicates

tol

numeric tolerance for identifying duplicates

ties

a function that is used to summarize groups of values that fallwithin the tolerance

Value

a numeric vector of the unique values inx


[8]ページ先頭

©2009-2025 Movatter.jp