| Type: | Package |
| Title: | Penalized Composite Link Model for Efficient Estimation ofSmooth Distributions from Coarsely Binned Data |
| Version: | 1.4.4 |
| Description: | Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015) <doi:10.1093/aje/kwv020>. |
| License: | MIT + file LICENSE |
| LazyData: | TRUE |
| Depends: | R (≥ 3.4.0) |
| Imports: | pbapply (≥ 1.3), Rcpp (≥ 0.12.0), Rdpack (≥ 0.8), Matrix |
| LinkingTo: | Rcpp, RcppEigen |
| Suggests: | MortalityLaws (≥ 1.5.0), knitr (≥ 1.20), rmarkdown (≥1.10), testthat (≥ 2.0.0) |
| RdMacros: | Rdpack |
| URL: | https://github.com/mpascariu/ungroup |
| BugReports: | https://github.com/mpascariu/ungroup/issues |
| VignetteBuilder: | knitr |
| Encoding: | UTF-8 |
| RoxygenNote: | 7.3.0 |
| NeedsCompilation: | yes |
| Packaged: | 2024-01-29 15:00:04 UTC; mpascariu |
| Author: | Marius D. Pascariu |
| Maintainer: | Marius D. Pascariu <rpascariu@outlook.com> |
| Repository: | CRAN |
| Date/Publication: | 2024-01-31 13:00:02 UTC |
ungroup: Penalized Composite Link Model for Efficient Estimation of Smooth Distributions from Coarsely Binned Data
Description
Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015)doi:10.1093/aje/kwv020.
Details
To learn more about the package, start with the vignettes:browseVignettes(package = "ungroup")
Author(s)
Maintainer: Marius D. Pascariurpascariu@outlook.com (ORCID)
Authors:
Silvia Rizzisrizzi@health.sdu.dk
Jonas Schoeley (ORCID)
Maciej J. DankoDanko@demogr.mpg.de (ORCID)
References
Currie ID, Durban M, Eilers PH (2004).“Smoothing and forecasting mortality rates.”Statistical modelling,4(4), 279–298.
Eilers PH (2007).“Ill-posed problems with counts, the composite link model and penalized likelihood.”Statistical Modelling,7(3), 239-254.doi:10.1177/1471082X0700700302.
Hastie TJ, Tibshirani RJ (1990).“Generalized additive models.”Monographs on Statistics and Applied Probability,43.
Human Mortality Database (2018).“University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Data downloaded on 17/01/2018.”https://www.mortality.org.
Pascariu MD (2018).MortalityLaws: Parametric Mortality Models, Life Tables and HMD.R package version 1.6.0,https://github.com/mpascariu/MortalityLaws.
Rizzi S, Gampe J, Eilers PHC (2015).“Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.”American Journal of Epidemiology,182(2), 138-147.doi:10.1093/aje/kwv020.
Rizzi S, Halekoh U, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Lindahl-Jacobsen R (2019).“How to estimate mortality trends from grouped vital statistics.”International Journal of Epidemiology,48(2), 571–582.doi:10.1093/ije/dyy183.
Rizzi S, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Vaupel JW, Lindahl-Jacobsen R (2016).“Comparison of non-parametric methods for ungrouping coarsely aggregated data.”BMC medical research methodology,16(1), 59.doi:10.1186/s12874-016-0157-8.
Thompson R, Baker RJ (1981).“Composite link functions in generalized linear models.”Applied Statistics, 125–131.
See Also
Useful links:
PCLM Akaike Information Criterion
Description
Generic function calculating Akaike's ‘An Information Criterion’ forone or several fitted model objects for which a log-likelihood valuecan be obtained, according to the formula-2 \mbox{log-likelihood} + k n_{par},wheren_{par} represents the number of parameters in thefitted model, andk = 2 for the usual AIC, ork = \log(n)(n being the number of observations) for the so-called BIC or SBC(Schwarz's Bayesian criterion).
Usage
## S3 method for class 'pclm'AIC(object, ..., k = 2)Arguments
object | a fitted model object for which there exists a |
... | optionally more fitted model objects. |
k | numeric, thepenalty per parameter to be used; thedefault |
Details
When comparing models fitted by maximum likelihood to the same data,the smaller the AIC or BIC, the better the fit.
The theory of AIC requires that the log-likelihood has been maximized:whereas AIC can be computed for models not fitted by maximumlikelihood, their AIC values should not be compared.
Examples of models not ‘fitted to the same data’ are where theresponse is transformed (accelerated-life models are fitted tolog-times) and where contingency tables have been used to summarizedata.
These are generic functions (with S4 generics defined in packagestats4): however methods should be defined for thelog-likelihood functionlogLik rather than thesefunctions: the action of their default methods is to calllogLikon all the supplied objects and assemble the results. Note that inseveral common caseslogLik does not return the value atthe MLE: see its help page.
The log-likelihood and hence the AIC/BIC is only defined up to anadditive constant. Different constants have conventionally been usedfor different purposes and soextractAIC andAICmay give different values (and do for models of class"lm": seethe help forextractAIC). Particular care is neededwhen comparing fits of different classes (with, for example, acomparison of a Poisson and gamma GLM being meaningless since one hasa discrete response, the other continuous).
BIC is defined asAIC(object, ..., k = log(nobs(object))).This needs the number of observations to be known: the default methodlooks first for a"nobs" attribute on the return value from thelogLik method, then tries thenobsgeneric, and if neither succeed returns BIC asNA.
Value
If just one object is provided, a numeric value with the correspondingAIC (or BIC, or ..., depending onk).
If multiple objects are provided, adata.frame with rowscorresponding to the objects and columns representing the number ofparameters in the model (df) and the AIC or BIC.
References
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986).Akaike Information Criterion Statistics.D. Reidel Publishing Company.
PCLM-2D Akaike Information Criterion
Description
Generic function calculating Akaike's ‘An Information Criterion’ forone or several fitted model objects for which a log-likelihood valuecan be obtained, according to the formula-2 \mbox{log-likelihood} + k n_{par},wheren_{par} represents the number of parameters in thefitted model, andk = 2 for the usual AIC, ork = \log(n)(n being the number of observations) for the so-called BIC or SBC(Schwarz's Bayesian criterion).
Usage
## S3 method for class 'pclm2D'AIC(object, ..., k = 2)Arguments
object | a fitted model object for which there exists a |
... | optionally more fitted model objects. |
k | numeric, thepenalty per parameter to be used; thedefault |
Details
When comparing models fitted by maximum likelihood to the same data,the smaller the AIC or BIC, the better the fit.
The theory of AIC requires that the log-likelihood has been maximized:whereas AIC can be computed for models not fitted by maximumlikelihood, their AIC values should not be compared.
Examples of models not ‘fitted to the same data’ are where theresponse is transformed (accelerated-life models are fitted tolog-times) and where contingency tables have been used to summarizedata.
These are generic functions (with S4 generics defined in packagestats4): however methods should be defined for thelog-likelihood functionlogLik rather than thesefunctions: the action of their default methods is to calllogLikon all the supplied objects and assemble the results. Note that inseveral common caseslogLik does not return the value atthe MLE: see its help page.
The log-likelihood and hence the AIC/BIC is only defined up to anadditive constant. Different constants have conventionally been usedfor different purposes and soextractAIC andAICmay give different values (and do for models of class"lm": seethe help forextractAIC). Particular care is neededwhen comparing fits of different classes (with, for example, acomparison of a Poisson and gamma GLM being meaningless since one hasa discrete response, the other continuous).
BIC is defined asAIC(object, ..., k = log(nobs(object))).This needs the number of observations to be known: the default methodlooks first for a"nobs" attribute on the return value from thelogLik method, then tries thenobsgeneric, and if neither succeed returns BIC asNA.
Value
If just one object is provided, a numeric value with the correspondingAIC (or BIC, or ..., depending onk).
If multiple objects are provided, adata.frame with rowscorresponding to the objects and columns representing the number ofparameters in the model (df) and the AIC or BIC.
References
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986).Akaike Information Criterion Statistics.D. Reidel Publishing Company.
PCLM Bayes Information Criterion
Description
Generic function calculating Akaike's ‘An Information Criterion’ forone or several fitted model objects for which a log-likelihood valuecan be obtained, according to the formula-2 \mbox{log-likelihood} + k n_{par},wheren_{par} represents the number of parameters in thefitted model, andk = 2 for the usual AIC, ork = \log(n)(n being the number of observations) for the so-called BIC or SBC(Schwarz's Bayesian criterion).
Usage
## S3 method for class 'pclm'BIC(object, ...)Arguments
object | a fitted model object for which there exists a |
... | optionally more fitted model objects. |
Details
When comparing models fitted by maximum likelihood to the same data,the smaller the AIC or BIC, the better the fit.
The theory of AIC requires that the log-likelihood has been maximized:whereas AIC can be computed for models not fitted by maximumlikelihood, their AIC values should not be compared.
Examples of models not ‘fitted to the same data’ are where theresponse is transformed (accelerated-life models are fitted tolog-times) and where contingency tables have been used to summarizedata.
These are generic functions (with S4 generics defined in packagestats4): however methods should be defined for thelog-likelihood functionlogLik rather than thesefunctions: the action of their default methods is to calllogLikon all the supplied objects and assemble the results. Note that inseveral common caseslogLik does not return the value atthe MLE: see its help page.
The log-likelihood and hence the AIC/BIC is only defined up to anadditive constant. Different constants have conventionally been usedfor different purposes and soextractAIC andAICmay give different values (and do for models of class"lm": seethe help forextractAIC). Particular care is neededwhen comparing fits of different classes (with, for example, acomparison of a Poisson and gamma GLM being meaningless since one hasa discrete response, the other continuous).
BIC is defined asAIC(object, ..., k = log(nobs(object))).This needs the number of observations to be known: the default methodlooks first for a"nobs" attribute on the return value from thelogLik method, then tries thenobsgeneric, and if neither succeed returns BIC asNA.
Value
If just one object is provided, a numeric value with the correspondingAIC (or BIC, or ..., depending onk).
If multiple objects are provided, adata.frame with rowscorresponding to the objects and columns representing the number ofparameters in the model (df) and the AIC or BIC.
References
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986).Akaike Information Criterion Statistics.D. Reidel Publishing Company.
PCLM-2D Akaike Information Criterion
Description
Generic function calculating Akaike's ‘An Information Criterion’ forone or several fitted model objects for which a log-likelihood valuecan be obtained, according to the formula-2 \mbox{log-likelihood} + k n_{par},wheren_{par} represents the number of parameters in thefitted model, andk = 2 for the usual AIC, ork = \log(n)(n being the number of observations) for the so-called BIC or SBC(Schwarz's Bayesian criterion).
Usage
## S3 method for class 'pclm2D'BIC(object, ..., k = 2)Arguments
object | a fitted model object for which there exists a |
... | optionally more fitted model objects. |
k | numeric, thepenalty per parameter to be used; thedefault |
Details
When comparing models fitted by maximum likelihood to the same data,the smaller the AIC or BIC, the better the fit.
The theory of AIC requires that the log-likelihood has been maximized:whereas AIC can be computed for models not fitted by maximumlikelihood, their AIC values should not be compared.
Examples of models not ‘fitted to the same data’ are where theresponse is transformed (accelerated-life models are fitted tolog-times) and where contingency tables have been used to summarizedata.
These are generic functions (with S4 generics defined in packagestats4): however methods should be defined for thelog-likelihood functionlogLik rather than thesefunctions: the action of their default methods is to calllogLikon all the supplied objects and assemble the results. Note that inseveral common caseslogLik does not return the value atthe MLE: see its help page.
The log-likelihood and hence the AIC/BIC is only defined up to anadditive constant. Different constants have conventionally been usedfor different purposes and soextractAIC andAICmay give different values (and do for models of class"lm": seethe help forextractAIC). Particular care is neededwhen comparing fits of different classes (with, for example, acomparison of a Poisson and gamma GLM being meaningless since one hasa discrete response, the other continuous).
BIC is defined asAIC(object, ..., k = log(nobs(object))).This needs the number of observations to be known: the default methodlooks first for a"nobs" attribute on the return value from thelogLik method, then tries thenobsgeneric, and if neither succeed returns BIC asNA.
Value
If just one object is provided, a numeric value with the correspondingAIC (or BIC, or ..., depending onk).
If multiple objects are provided, adata.frame with rowscorresponding to the objects and columns representing the number ofparameters in the model (df) and the AIC or BIC.
References
Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986).Akaike Information Criterion Statistics.D. Reidel Publishing Company.
Construct B-spline basis
Description
This is an internal function of package MortalitySmooth which creates equally-spaced B-splines basis over an abscissa of data.
Usage
MortSmooth_bbase(x, xl, xr, ndx, deg)Arguments
x | vector for the abscissa of data; |
xl | left boundary; |
xr | right boundary; |
ndx | number of internal knots minus one or number of internal intervals; |
deg | degree of the splines. |
Details
The function reproduce an algorithm presented by Eilers and Marx (2010) using differences of truncated power functions (see MortSmooth_tpower). The final matrix has a single B-spline for each of the [ndx + deg] columns. The number of rows is equal to the length of x.
The function differs from bs in the package splines since it automatically constructed B-splines with identical shape. This would allow a simple interpretation of coefficients and application of simple differencing.
Value
A matrix containing equally-spaced B-splines of degree deg along x for each column.
Author(s)
Carlo G Camarda
Truncated p-th Power Function
Description
This is an internal function of package MortalitySmooth which constructs a truncated p-th power function along an abscissa within the function MortSmooth_bbase
Usage
MortSmooth_tpower(x, tt, p)Arguments
x | for the abscissa of data; |
tt | vector of truncation points; |
p | degree of the power |
Details
Internal function used in MortSmooth_bbase. The vector t contains the knots. The simplest system of truncated power functions uses p = 0 and it consists of step functions with jumps of size 1 at the truncation points t.
Author(s)
Carlo G Camarda
Construct B-spline basisThis is an internal function which constructs B-spline basis to be used in pclm estimation
Description
Construct B-spline basisThis is an internal function which constructs B-spline basis to be used in pclm estimation
Usage
build_B_spline_basis(X, Y, kr, deg, diff, type)Arguments
X | vector with ages |
Y | vector with years |
kr | Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See |
deg | Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data. |
diff | An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2. |
type | Type of PCLM model. Options: |
See Also
Build Composition Matrices
Description
Build Composition Matrices
Usage
build_C_matrix(x, y, nlast, offset, out.step, type)Arguments
x | Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins |
y | Vector with counts to be ungrouped. It must have the same dimension as |
nlast | Length of the last interval. In the example above |
offset | Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
out.step | Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
type | Type of PCLM model. Options: |
Construct Penalty Matrix
Description
Construct Penalty Matrix
Usage
build_P_matrix(BA, BY, lambda, type)Arguments
BA | B-spline basis object for age axis |
BY | B-spline basis object for year axis |
lambda | Smoothing parameter to be used in pclm estimation.If |
type | Type of PCLM model. Options: |
Compute Standard Errors and Confidence Intervals
Description
Compute Standard Errors and Confidence Intervals
Usage
compute_standard_errors(B, QmQ, QmQP)Arguments
B | B-spline basis matrix |
QmQ | Object generated by pclm.fit |
QmQP | Object generated by pclm.fit |
Auxiliary for Controllingpclm Fitting
Description
Auxiliary for Controllingpclm Fitting
Usage
control.pclm(lambda = NA, kr = 2, deg = 3, int.lambda = c(0.1, 1e+5), diff = 2, opt.method = c("BIC", "AIC"), max.iter = 1e+3, tol = 1e-3)Arguments
lambda | Smoothing parameter to be used in pclm estimation.If |
kr | Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See |
deg | Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data. |
int.lambda | If |
diff | An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2. |
opt.method | Selection criterion of the model.Possible values are |
max.iter | Maximal number of iterations used in fitting procedure. |
tol | Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin. |
Value
A list with exactly eight control parameters.
See Also
Examples
control.pclm()Auxiliary for Controllingpclm2D Fitting
Description
Auxiliary for Controllingpclm2D Fitting
Usage
control.pclm2D(lambda = c(1, 1), kr = 7, deg = 3, int.lambda = c(0.1, 1e+3), diff = 2, opt.method = c("BIC", "AIC"), max.iter = 1e+3, tol = 1e-3)Arguments
lambda | Smoothing parameter to be used in pclm estimation.If |
kr | Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See |
deg | Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data. |
int.lambda | If |
diff | An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2. |
opt.method | Selection criterion of the model.Possible values are |
max.iter | Maximal number of iterations used in fitting procedure. |
tol | Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin. |
Value
A list with exactly eight control parameters.
See Also
Examples
control.pclm2D()Create an additional bin with a small value at the end. Improves convergence.
Description
Create an additional bin with a small value at the end. Improves convergence.
Usage
create.artificial.bin(i, vy = 1, vo = 1.01)Arguments
i | A list of input values corresponding to pclm or pclm2D; |
vy | Numerical value of the bin created for |
vo | Numerical values of the bin created for |
Delete from results the last group added artificially in pclm and pclm2D
Description
Delete from results the last group added artificially in pclm and pclm2D
Usage
delete.artificial.bin(M)Arguments
M | A pclm.fit object |
Extract Fractional Part of a Number
Description
Extract Fractional Part of a Number
Usage
frac(x)Arguments
x | A numeric value, vector or matrix |
Map groups and borders
Description
We assume no missing values between the bins
Usage
map.bins(x, nlast, out.step)Arguments
x | Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins |
nlast | Length of the last interval. In the example above |
out.step | Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
Objective function
Description
Objective function
Usage
ofun(L, I, type)Arguments
L | lambda.hat |
I | Input object from pclm function |
type | Type of PCLM model. Options: |
Optimize Smoothing ParametersThis function optimize searches oflambda, kr anddeg. Seecontrol.pclm to see what is their meaning. The optimization process works in steps. Simultaneous optimization was tested and found inefficient.
Description
Optimize Smoothing ParametersThis function optimize searches oflambda, kr anddeg. Seecontrol.pclm to see what is their meaning. The optimization process works in steps. Simultaneous optimization was tested and found inefficient.
Usage
optimize_par(I, type)Arguments
I | Input object from pclm function |
type | Type of PCLM model. Options: |
Univariate Penalized Composite Link Model (PCLM)
Description
Fit univariate penalized composite link model (PCLM) to ungroup binned count data, e.g. age-at-death distributions grouped in age classes.
Usage
pclm( x, y, nlast, offset = NULL, out.step = 1, ci.level = 95, verbose = FALSE, control = list())Arguments
x | Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins |
y | Vector with counts to be ungrouped. It must have the same dimension as |
nlast | Length of the last interval. In the example above |
offset | Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
out.step | Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
ci.level | Level of significance for computing confidence intervals. Default: |
verbose | Logical value. Indicates whether a progress bar should be shown or not.Default: |
control | List with additional parameters:
|
Details
The PCLM method is based on the composite link model, which extends standard generalized linear models. It implements the idea that the observed counts, interpreted as realizations from Poisson distributions, are indirect observations of a finer (ungrouped) but latent sequence. This latent sequence represents the distribution of expected means on a fine resolution and has to be estimated from the aggregated data. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively reweighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Bayesian or Akaike's Information Criterion.
Value
The output is a list with the following components:
input | A list with arguments provided in input. Saved for convenience. |
fitted | The fitted values of the PCLM model. |
ci | Confidence intervals around fitted values. |
goodness.of.fit | A list containing goodness of fit measures: standard errors, AIC and BIC. |
smoothPar | Estimated smoothing parameters: |
bins.definition | Additional values to identify the bins limits and location in input and output objects. |
deep | A list of objects created in the fitting process. Useful in diagnosis of possible issues. |
call | An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments. |
References
Rizzi S, Gampe J, Eilers PHC (2015).“Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.”American Journal of Epidemiology,182(2), 138-147.doi:10.1093/aje/kwv020.
See Also
Examples
# Data x <- c(0, 1, seq(5, 85, by = 5))y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998, 1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)offset <- c(114, 440, 509, 492, 628, 618, 576, 580, 634, 657, 631, 584, 573, 619, 530, 384, 303, 245, 249) * 1000nlast <- 26 # the size of the last interval# Example 1 ----------------------M1 <- pclm(x, y, nlast)ls(M1)summary(M1)fitted(M1)plot(M1)# Example 2 ----------------------# ungroup even in smaller intervalsM2 <- pclm(x, y, nlast, out.step = 0.5)head(fitted(M1))plot(M1, type = "s")# Note, in example 1 we are estimating intervals of length 1. In example 2 # we are estimating intervals of length 0.5 using the same aggregate data.# Example 3 ----------------------# Do not optimise smoothing parameters; choose your own. Faster.M3 <- pclm(x, y, nlast, out.step = 0.5, control = list(lambda = 100, kr = 10, deg = 10))plot(M3)summary(M2)summary(M3) # not the smallest BIC here, but sometimes is not important.# Example 4 -----------------------# Grouped x & grouped offset (estimate death rates)M4 <- pclm(x, y, nlast, offset)plot(M4, type = "s")# Example 5 -----------------------# Grouped x & ungrouped offset (estimate death rates)ungroupped_Ex <- pclm(x, y = offset, nlast, offset = NULL)$fitted # ungroupped offset dataM5 <- pclm(x, y, nlast, offset = ungroupped_Ex)Compute Confidence Intervals for PCLM output
Description
Compute Confidence Intervals for PCLM output
Usage
pclm.confidence(fit, out.step, y, SE, ci.level, type, offset)Arguments
fit | Fitted values from pclm.fit |
out.step | Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
y | Vector with counts to be ungrouped. It must have the same dimension as |
SE | Standard Errors |
ci.level | Level of significance for computing confidence intervals. Default: |
type | Type of PCLM model. Options: |
offset | Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
Compute Confidence Intervals for estimated distribution
Description
Compute Confidence Intervals for estimated distribution
Usage
pclm.confidence.dx(X)Arguments
X | List of inputs from pclm.confidence function |
Compute Confidence Intervals for estimated hazard
Description
Compute Confidence Intervals for estimated hazard
Usage
pclm.confidence.mx(X)Arguments
X | List of inputs from pclm.confidence function |
Fit PCLM Models
Description
This is an internal function used to estimate PCLM model. It is used bypclm andpclm2D functions
Usage
pclm.fit( x, y, nlast, offset, out.step, verbose, lambda, kr, deg, diff, max.iter, tol, type)Arguments
x | Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins |
y | Vector with counts to be ungrouped. It must have the same dimension as |
nlast | Length of the last interval. In the example above |
offset | Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
out.step | Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
verbose | Logical value. Indicates whether a progress bar should be shown or not.Default: |
lambda | Smoothing parameter to be used in pclm estimation.If |
kr | Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. See |
deg | Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data. |
diff | An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2. |
max.iter | Maximal number of iterations used in fitting procedure. |
tol | Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin. |
type | Type of PCLM model. Options: |
Validate input values
Description
Validate input values
Usage
pclm.input.check(X, pclm.type)Arguments
X | A list with input arguments provided in |
Two-Dimensional Penalized Composite Link Model (PCLM-2D)
Description
Fit two-dimensional penalized composite link model (PCLM-2D), e.g. simultaneous ungrouping of age-at-death distributions grouped in age classes for adjacent years. The PCLM can be extended to a two-dimensional regression problem. This is particularly suitable for mortality analysis when mortality surfaces are to be estimated to capture both age-specific trajectories of coarsely grouped distributions and time trends (Rizzi et al. 2019).
Usage
pclm2D( x, y, nlast, offset = NULL, out.step = 1, ci.level = 95, verbose = TRUE, control = list())Arguments
x | Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins |
y |
|
nlast | Length of the last interval. In the example above |
offset | Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details. |
out.step | Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |
ci.level | Level of significance for computing confidence intervals. Default: |
verbose | Logical value. Indicates whether a progress bar should be shown or not. Default: |
control | List with additional parameters:
|
Value
The output is a list with the following components:
input | A list with arguments provided in input. Saved for convenience. |
fitted | The fitted values of the PCLM model. |
ci | Confidence intervals around fitted values. |
goodness.of.fit | A list containing goodness of fit measures: standard errors, AIC and BIC. |
smoothPar | Estimated smoothing parameters: |
bins.definition | Additional values to identify the bins limits and location in input and output objects. |
deep | A list of objects created in the fitting process. Useful in diagnosis of possible issues. |
call | An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments. |
References
Rizzi S, Gampe J, Eilers PHC (2015).“Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.”American Journal of Epidemiology,182(2), 138-147.doi:10.1093/aje/kwv020.
Rizzi S, Halekoh U, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Lindahl-Jacobsen R (2019).“How to estimate mortality trends from grouped vital statistics.”International Journal of Epidemiology,48(2), 571–582.doi:10.1093/ije/dyy183.
See Also
Examples
# Input dataDx <- ungroup.data$DxEx <- ungroup.data$Ex# Aggregate data to be ungrouped in the examples below# Select a 10y data framex <- c(0, 1, seq(5, 85, by = 5))nlast <- 26n <- c(diff(x), nlast)group <- rep(x, n)y <- aggregate(Dx, by = list(group), FUN = "sum")[, 2:10]offset <- aggregate(Ex, by = list(group), FUN = "sum")[, 2:10]# Example 1 ---------------------- # Fit model and ungroup data using PCLM-2DP1 <- pclm2D(x, y, nlast)summary(P1)# Plot fitted valuesplot(P1)# Plot input dataplot(P1, "observed")# NOTE: pclm2D does not search for optimal smoothing parameters by default# (like pclm does) because it is more time consuming. If optimization is # required set lambda = c(NA, NA):P1 <- pclm2D(x, y, nlast, control = list(lambda = c(NA, NA)))# Example 2 ---------------------- # Ungroup and build a mortality surfaceP2 <- pclm2D(x, y, nlast, offset)summary(P2)plot(P2, type = "observed")plot(P2, type = "fitted")plot(P2, type = "fitted", colors = c("blue", "red"))Generic Plot for pclm Class
Description
Generic Plot for pclm Class
Usage
## S3 method for class 'pclm'plot(x, xlab, ylab, ylim, type, lwd, col, legend, legend.position, ...)Arguments
x | An object of class |
xlab | a label for the x axis, defaults to a description of |
ylab | a label for the y axis, defaults to a description of |
ylim | the y limits of the plot. |
type | 1-character string giving the type of plot desired. The following values are possible, for details, see plot: "p" for points, "l" for lines, "b" for both points and lines, "c" for empty points joined by lines, "o" for overplotted points and lines, "s" and "S" for stair steps and "h" for histogram-like vertical lines. Finally, "n" does not produce any points or lines. |
lwd | Line width, a positive number, defaulting to 2. |
col | Three colours to be used in the plot for observed values, fitted values and confidence intervals. |
legend | a character orexpression vectorof length |
legend.position | Legend position, or the x and y co-ordinates to be used to position the legend. |
... | other graphical parameters (seepar for more details). |
See Also
Examples
# See complete examples in pclm help pageGeneric Plot for pclm2D Class
Description
The generic plot for apclm2D object is constructed usingpersp method.
Usage
## S3 method for class 'pclm2D'plot( x, type = c("fitted", "observed"), colors = c("#b6e3db", "#e5d9c2", "#b5ba61", "#725428"), nbcol = 25, xlab = "x", ylab = "y", zlab = "values", phi = 30, theta = 210, border = "grey50", ticktype = "simple", ...)Arguments
x | an object of class |
type | chart type. Defines which data are plotted, |
colors | colors to interpolate; must be a valid argument to |
nbcol | dimension of the color palette. Number of colors. Default: 25. |
xlab,ylab,zlab | titles for the axes. N.B. These must becharacter strings; expressions are not accepted. Numbers will becoerced to character strings. |
theta,phi | angles defining the viewing direction. |
border | the color of the line drawn around the surface facets.The default, |
ticktype | character: |
... | any other argument to be passed to |
See Also
Examples
# See complete examples in pclm2D help pagePrint for pclm method
Description
Print for pclm method
Usage
## S3 method for class 'pclm'print(x, ...)Arguments
x | An object of class |
... | further arguments passed to or from other methods. |
Print method for pclm2D
Description
Print method for pclm2D
Usage
## S3 method for class 'pclm2D'print(x, ...)Arguments
x | An object of class |
... | further arguments passed to or from other methods. |
Print for summary.pclm method
Description
Print for summary.pclm method
Usage
## S3 method for class 'summary.pclm'print(x, ...)Arguments
x | An object of class |
... | further arguments passed to or from other methods. |
Print method for summary.pclm2D
Description
Print method for summary.pclm2D
Usage
## S3 method for class 'summary.pclm2D'print(x, ...)Arguments
x | An object of class |
... | further arguments passed to or from other methods. |
Print function forungroup.data
Description
Print function forungroup.data
Usage
## S3 method for class 'ungroup.data'print(x, ...)Arguments
x | An |
... | further arguments passed to or from other methods. |
Extract PCLM Deviance Residuals
Description
Extract PCLM Deviance Residuals
Usage
## S3 method for class 'pclm'residuals(object, ...)Arguments
object | an object for which the extraction of model residuals ismeaningful. |
... | other arguments. |
Value
Residuals extracted from the objectobject.
Examples
x <- c(0, 1, seq(5, 85, by = 5))y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998, 1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)M1 <- pclm(x, y, nlast = 26)residuals(M1)Extract PCLM-2D Deviance Residuals
Description
Extract PCLM-2D Deviance Residuals
Usage
## S3 method for class 'pclm2D'residuals(object, ...)Arguments
object | an object for which the extraction of model residuals ismeaningful. |
... | other arguments. |
Value
Residuals extracted from the objectobject.
Examples
Dx <- ungroup.data$DxEx <- ungroup.data$Ex# Aggregate data to ungroup it in the example belowx <- c(0, 1, seq(5, 85, by = 5))nlast <- 26n <- c(diff(x), nlast)group <- rep(x, n)y <- aggregate(Dx, by = list(group), FUN = "sum")[, -1]# ExampleP1 <- pclm2D(x, y, nlast)residuals(P1)Sequence function with last value
Description
Sequence function with last value
Usage
seqlast(from, to, by)Arguments
from,to | the starting and (maximal) end values of thesequence. Of length |
by | number: increment of the sequence. |
Suggest values ofout.step that do not require an adjustment ofnlast
Description
Suggest values ofout.step that do not require an adjustment ofnlast
Usage
suggest.valid.out.step(len, increment = 0.01)Arguments
len | Interval length |
increment | Increment |
Summary for pclm method
Description
Summary for pclm method
Usage
## S3 method for class 'pclm'summary(object, ...)Arguments
object | an object for which a summary is desired. |
... | additional arguments affecting the summary produced. |
Summary method for pclm2DGeneric function used to produce result summaries of the results produced bypclm2D.
Description
Summary method for pclm2DGeneric function used to produce result summaries of the results produced bypclm2D.
Usage
## S3 method for class 'pclm2D'summary(object, ...)Arguments
object | an object for which a summary is desired. |
... | additional arguments affecting the summary produced. |
Test Dataset in the Package
Description
Dataset containing death counts (Dx) and exposures (Ex) by age for a certain population between 1980 and 2014. The data-set is provided for testing purposes only and might be altered and outdated. Download actual demographic data free of charge from Human Mortality Database (2018). Once a username and a password is created on thewebsite theMortalityLaws R package can be used to extract data in R format.
Usage
ungroup.dataFormat
An object of classungroup.data of length 2.
Source
References
Human Mortality Database (2018).“University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Data downloaded on 17/01/2018.”https://www.mortality.org.
Pascariu MD (2018).MortalityLaws: Parametric Mortality Models, Life Tables and HMD.R package version 1.6.0,https://github.com/mpascariu/MortalityLaws.
See Also
Check ifnlast needs to be adjusted in order to accommodateout.step
Description
Check ifnlast needs to be adjusted in order to accommodateout.step
Usage
validate.nlast(x, nlast, out.step)Arguments
x | Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins |
nlast | Length of the last interval. In the example above |
out.step | Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1. |