| Title: | Reconstruct a Distribution from a Collection of Quantiles |
| Version: | 1.0.4 |
| Description: | Given a set of predictive quantiles from a distribution, estimate the distribution and create 'd', 'p', 'q', and 'r' functions to evaluate its density function, distribution function, and quantile function, and generate random samples. On the interior of the provided quantiles, an interpolation method such as a monotonic cubic spline is used; the tails are approximated by a location-scale family. |
| License: | GPL (≥ 3) |
| URL: | http://reichlab.io/distfromq/ |
| Imports: | checkmate, purrr, splines, stats, utils, zeallot |
| Suggests: | dplyr, ggplot2, knitr, rmarkdown, testthat (≥ 3.0.0) |
| VignetteBuilder: | knitr |
| Config/testthat/edition: | 3 |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.1 |
| NeedsCompilation: | no |
| Packaged: | 2024-09-10 21:02:16 UTC; lshandross |
| Author: | Evan Ray [aut, cre], Aaron Gerding [aut], Li Shandross [ctb], Nick Reich [ctb] |
| Maintainer: | Evan Ray <elray@umass.edu> |
| Repository: | CRAN |
| Date/Publication: | 2024-09-13 18:00:06 UTC |
Identify duplicated values in a sorted numeric vector, where comparison isup to a specified numeric tolerance. If there is a run of values where eachconsecutive pair is closer together than the tolerance, all are labeled asduplicates even if not all values in the run are within the tolerance.
Description
Identify duplicated values in a sorted numeric vector, where comparison isup to a specified numeric tolerance. If there is a run of values where eachconsecutive pair is closer together than the tolerance, all are labeled asduplicates even if not all values in the run are within the tolerance.
Usage
duplicated_tol(x, tol = 1e-06, incl_first = FALSE)Arguments
x | a numeric vector in which to identify duplicates |
tol | numeric tolerance for identifying duplicates |
incl_first | boolean indicator of whether or not the first entry in arun of duplicates should be indicated as a duplicate. |
Value
a boolean vector of the same length asx
Get indices of starts and ends of runs of duplicate values
Description
Get indices of starts and ends of runs of duplicate values
Usage
get_dup_run_inds(dups)Arguments
dups | a boolean vector that would result from calling |
Value
named list with entriesstarts giving indices of the first elementin each sequence of runs of duplicate values andends giving indices ofthe last element in each sequence of runs of duplicate values.
Creates a function that evaluates the probability density function of anapproximation to a distribution obtained by interpolating and extrapolatingfrom a set of quantiles of the distribution.
Description
Creates a function that evaluates the probability density function of anapproximation to a distribution obtained by interpolating and extrapolatingfrom a set of quantiles of the distribution.
Usage
make_d_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12)Arguments
ps | vector of probability levels |
qs | vector of quantile values corresponding to ps |
interior_method | method for interpolating the distribution on theinterior of the provided |
interior_args | an optional named list of arguments that are passedon to the |
tail_dist | name of parametric distribution for the tails |
dup_tol | numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance. |
zero_tol | numeric tolerance for identifying values in |
Details
The defaultinterior_method,"spline_cdf", represents thedistribution as a sum of a discrete component at any points where thereare duplicatedqs for multiple differentps and a continuous componentthat is estimated by using a monotonic cubic spline that interpolates theprovided(q, p) pairs as an estimate of the CDF. The density function isthen obtained by differentiating this estimate of the CDF.
Optionally, the user may provide another function that accepts argumentsps,qs,tail_dist, andfn_type (which will be either"d","p",or"q"), and optionally additional named arguments to be specified viainterior_args. This function should return a function with argumentsx,log that evaluates the pdf or its logarithm.
Value
a function with argumentsx andlog that can be used to evaluatethe approximate density function (or itslog) at the pointsx.
Creates a function that evaluates the cumulative distribution function of anapproximation to a distribution obtained by interpolating and extrapolatingfrom a set of quantiles of the distribution.
Description
Creates a function that evaluates the cumulative distribution function of anapproximation to a distribution obtained by interpolating and extrapolatingfrom a set of quantiles of the distribution.
Usage
make_p_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12)Arguments
ps | vector of probability levels |
qs | vector of quantile values corresponding to ps |
interior_method | method for interpolating the distribution on theinterior of the provided |
interior_args | an optional named list of arguments that are passedon to the |
tail_dist | name of parametric distribution for the tails |
dup_tol | numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance. |
zero_tol | numeric tolerance for identifying values in |
Details
The defaultinterior_method,"spline_cdf", represents thedistribution as a sum of a discrete component at any points where thereare duplicatedqs for multiple differentps and a continuous componentthat is estimated by using a monotonic cubic spline that interpolates theprovided(q, p) pairs as an estimate of the CDF.
Optionally, the user may provide another function that accepts argumentsps,qs,tail_dist, andfn_type (which will be either"d","p",or"q"), and optionally additional named arguments to be specified viainterior_args. This function should return a function with argumentsx,log that evaluates the pdf or its logarithm.
Value
a function with argumentsq andlog.p that can be used toevaluate the approximate cumulative distribution function (or itslog)at the pointsq.
Creates a function that evaluates the quantile function of an approximationto a distribution obtained by interpolating and extrapolating from a set ofquantiles of the distribution.
Description
Creates a function that evaluates the quantile function of an approximationto a distribution obtained by interpolating and extrapolating from a set ofquantiles of the distribution.
Usage
make_q_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12)Arguments
ps | vector of probability levels |
qs | vector of quantile values corresponding to ps |
interior_method | method for interpolating the distribution on theinterior of the provided |
interior_args | an optional named list of arguments that are passedon to the |
tail_dist | name of parametric distribution for the tails |
dup_tol | numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance. |
zero_tol | numeric tolerance for identifying values in |
Details
The defaultinterior_method,"spline_cdf", represents thedistribution as a sum of a discrete component at any points where thereare duplicatedqs for multiple differentps and a continuous componentthat is estimated by using a monotonic cubic spline that interpolates theprovided(q, p) pairs as an estimate of the CDF. The quantile functionis then obtained by inverting this estimate of the CDF.
Optionally, the user may provide another function that accepts argumentsps,qs,tail_dist, andfn_type (which will be either"d","p",or"q"), and optionally additional named arguments to be specified viainterior_args. This function should return a function with argumentpthat evaluates the quantile function.
Value
a function with argumentp that can be used to calculate quantilesof the approximated distribution at the probability levelsp.
Creates a function that generates random deviates from an approximationto a distribution obtained by interpolating and extrapolating from a set ofquantiles of the distribution.
Description
Creates a function that generates random deviates from an approximationto a distribution obtained by interpolating and extrapolating from a set ofquantiles of the distribution.
Usage
make_r_fn( ps, qs, interior_method = "spline_cdf", interior_args = list(), tail_dist = "norm", dup_tol = 1e-06, zero_tol = 1e-12)Arguments
ps | vector of probability levels |
qs | vector of quantile values corresponding to ps |
interior_method | method for interpolating the distribution on theinterior of the provided |
interior_args | an optional named list of arguments that are passedon to the |
tail_dist | name of parametric distribution for the tails |
dup_tol | numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance. |
zero_tol | numeric tolerance for identifying values in |
Details
The defaultinterior_method,"spline_cdf", represents thedistribution as a sum of a discrete component at any points where thereare duplicatedqs for multiple differentps and a continuous componentthat is estimated by using a monotonic cubic spline that interpolates theprovided(q, p) pairs as an estimate of the CDF. The quantile functionis then obtained by inverting this estimate of the CDF.
Optionally, the user may provide another function that accepts argumentsps,qs,tail_dist, andfn_type (which will be either"d","p",or"q"), and optionally additional named arguments to be specified viainterior_args. This function should return a function with argumentpthat evaluates the quantile function.
Value
a function with argumentn that can be used to generate a sample ofsizen from the approximated distribution.
Create a polySpline object representing a monotonic Hermite splineinterpolating a given set of points.
Description
Create a polySpline object representing a monotonic Hermite splineinterpolating a given set of points.
Usage
mono_Hermite_spline(x, y, m)Arguments
x | vector giving the x coordinates of the points to be interpolated. |
y | vector giving the y coordinates of the points to be interpolated.Must be increasing or decreasing for 'method = "hyman"'. |
m | (for 'splinefunH()') vector ofslopes |
Details
This function essentially reproducesstats::splinefunH, but itreturns a polynomial spline object as used in thesplines package ratherthan a function that evaluates the spline, and potentially makesadjustments to the input slopesm to enforce monotonicity.
Value
An object of classpolySpline with the spline object, suitable foruse with other functionality from thesplines package.
Approximate density function, CDF, or quantile function on the interior ofprovided quantiles by representing the distribution as a sum of a discretepart at any duplicatedqs and a continuous part for which the CDF isestimated using a monotonic Hermite spline. See details for more.
Description
Approximate density function, CDF, or quantile function on the interior ofprovided quantiles by representing the distribution as a sum of a discretepart at any duplicatedqs and a continuous part for which the CDF isestimated using a monotonic Hermite spline. See details for more.
Usage
spline_cdf(ps, qs, tail_dist, fn_type = c("d", "p", "q"), n_grid = 20)Arguments
ps | vector of probability levels |
qs | vector of quantile values corresponding to ps |
tail_dist | name of parametric distribution for the tails |
fn_type | the type of function that is requested: |
n_grid | grid size to use when augmenting the input |
Details
The CDF of the continuous part of the distribution is estimatedusing a monotonic degree 3 Hermite spline that interpolates the quantilesafter subtracting the discrete distribution and renormalizing. In theory,an estimate of the quantile function could be obtained by directly invertingthis spline. However, in practice, we have observed that this can suffer fromnumerical problems. Therefore, the default behavior of this function is toevaluate the "stage 1" CDF estimate corresponding to discrete point massesplus monotonic spline at a fine grid of points, and use the "stage 2" CDFestimate that linearly interpolates these points with steps at any duplicatedq values. The quantile function estimate is obtained by inverting this"stage 2" CDF estimate. When the distribution is continuous, we can obtain anestimate of the PDF by differentiating the CDF estimate, resulting in adiscontinuous "histogram density". The size of the grid can be specified withthen_grid argument. In settings where it is desirable to obtain acontinuous density function, the "stage 1" CDF estimate can be used bysettingn_grid = NULL.
Value
a function to evaluate the PDF, CDF, or quantile function.
Split ps and qs into those corresponding to discrete and continuousparts of a distribution.
Description
Split ps and qs into those corresponding to discrete and continuousparts of a distribution.
Usage
split_disc_cont_ps_qs( ps, qs, dup_tol = 1e-06, zero_tol = 1e-12, is_hurdle = FALSE)Arguments
ps | vector of probability levels |
qs | vector of quantile values corresponding to ps |
dup_tol | numeric tolerance for identifying duplicated values indicatinga discrete component of the distribution. If there is a run of values whereeach consecutive pair is closer together than the tolerance, all arelabeled as duplicates even if not all values in the run are within thetolerance. |
zero_tol | numeric tolerance for identifying values in |
is_hurdle | boolean indicating whether or not this is a hurdle model.If so, qs of zero always indicate the presence of a point mass at 0.In this case, 0 is not included among the returned |
Value
named list with the following entries:
disc_weight: estimated numeric weight of the discrete part of thedistribution.disc_ps: estimated probabilities of discrete components. May benumeric(0)if there are no estimated discrete components.disc_qs: locations of discrete components, corresponding to duplicatedvalues in the inputqs. May benumeric(0)if there are no discretecomponents.cont_ps: probability levels for the continuous part of the distributioncont_qs: quantile values for the continuous part of the distributiondisc_ps_range: a list of length equal to the number of point masses inthe discrete distribution. Each entry is a numeric vector of length twowith the value of the CDF approaching the point mass from the left andfrom the right.
A factory that returns a function that performs linear interpolation,allowing for "steps" or discontinuities.
Description
A factory that returns a function that performs linear interpolation,allowing for "steps" or discontinuities.
Usage
step_interp_factory(x, y, cont_dir = c("right", "left"), increasing = TRUE)Arguments
x | numeric vector with the "horizontal axis" coordinates of the pointsto interpolate. |
y | numeric vector with the "vertical axis" coordinates of the pointsto interpolate. |
cont_dir | at steps or discontinuities, the direction from which thefunction is continuous. This will be "right" for a CDF or "left" for a QF. |
increasing | boolean indicating whether the function is increasing ordecreasing. Only used in the degenerate case where there is only one uniquevalue of |
Value
a function with argumentx that performs linear approximation ofthe input data points.
Get unique values in a sorted numeric vector, where comparison is up to aspecified numeric tolerance. If there is a run of values where eachconsecutive pair is closer together than the tolerance, all are labeled ascorresponding to a single unique value even if not all values in the run arewithin the tolerance.
Description
Get unique values in a sorted numeric vector, where comparison is up to aspecified numeric tolerance. If there is a run of values where eachconsecutive pair is closer together than the tolerance, all are labeled ascorresponding to a single unique value even if not all values in the run arewithin the tolerance.
Usage
unique_tol(x, tol = 1e-06, ties = mean)Arguments
x | a numeric vector in which to identify duplicates |
tol | numeric tolerance for identifying duplicates |
ties | a function that is used to summarize groups of values that fallwithin the tolerance |
Value
a numeric vector of the unique values inx