Movatterモバイル変換


[0]ホーム

URL:


Type:Package
Title:Penalized Composite Link Model for Efficient Estimation ofSmooth Distributions from Coarsely Binned Data
Version:1.4.4
Description:Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015) <doi:10.1093/aje/kwv020>.
License:MIT + file LICENSE
LazyData:TRUE
Depends:R (≥ 3.4.0)
Imports:pbapply (≥ 1.3), Rcpp (≥ 0.12.0), Rdpack (≥ 0.8), Matrix
LinkingTo:Rcpp, RcppEigen
Suggests:MortalityLaws (≥ 1.5.0), knitr (≥ 1.20), rmarkdown (≥1.10), testthat (≥ 2.0.0)
RdMacros:Rdpack
URL:https://github.com/mpascariu/ungroup
BugReports:https://github.com/mpascariu/ungroup/issues
VignetteBuilder:knitr
Encoding:UTF-8
RoxygenNote:7.3.0
NeedsCompilation:yes
Packaged:2024-01-29 15:00:04 UTC; mpascariu
Author:Marius D. PascariuORCID iD [aut, cre], Silvia Rizzi [aut], Jonas SchoeleyORCID iD [aut], Maciej J. DankoORCID iD [aut]
Maintainer:Marius D. Pascariu <rpascariu@outlook.com>
Repository:CRAN
Date/Publication:2024-01-31 13:00:02 UTC

ungroup: Penalized Composite Link Model for Efficient Estimation of Smooth Distributions from Coarsely Binned Data

Description

Versatile method for ungrouping histograms (binned count data) assuming that counts are Poisson distributed and that the underlying sequence on a fine grid to be estimated is smooth. The method is based on the composite link model and estimation is achieved by maximizing a penalized likelihood. Smooth detailed sequences of counts and rates are so estimated from the binned counts. Ungrouping binned data can be desirable for many reasons: Bins can be too coarse to allow for accurate analysis; comparisons can be hindered when different grouping approaches are used in different histograms; and the last interval is often wide and open-ended and, thus, covers a lot of information in the tail area. Age-at-death distributions grouped in age classes and abridged life tables are examples of binned data. Because of modest assumptions, the approach is suitable for many demographic and epidemiological applications. For a detailed description of the method and applications see Rizzi et al. (2015)doi:10.1093/aje/kwv020.

Details

To learn more about the package, start with the vignettes:browseVignettes(package = "ungroup")

Author(s)

Maintainer: Marius D. Pascariurpascariu@outlook.com (ORCID)

Authors:

References

Currie ID, Durban M, Eilers PH (2004).“Smoothing and forecasting mortality rates.”Statistical modelling,4(4), 279–298.

Eilers PH (2007).“Ill-posed problems with counts, the composite link model and penalized likelihood.”Statistical Modelling,7(3), 239-254.doi:10.1177/1471082X0700700302.

Hastie TJ, Tibshirani RJ (1990).“Generalized additive models.”Monographs on Statistics and Applied Probability,43.

Human Mortality Database (2018).“University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Data downloaded on 17/01/2018.”https://www.mortality.org.

Pascariu MD (2018).MortalityLaws: Parametric Mortality Models, Life Tables and HMD.R package version 1.6.0,https://github.com/mpascariu/MortalityLaws.

Rizzi S, Gampe J, Eilers PHC (2015).“Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.”American Journal of Epidemiology,182(2), 138-147.doi:10.1093/aje/kwv020.

Rizzi S, Halekoh U, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Lindahl-Jacobsen R (2019).“How to estimate mortality trends from grouped vital statistics.”International Journal of Epidemiology,48(2), 571–582.doi:10.1093/ije/dyy183.

Rizzi S, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Vaupel JW, Lindahl-Jacobsen R (2016).“Comparison of non-parametric methods for ungrouping coarsely aggregated data.”BMC medical research methodology,16(1), 59.doi:10.1186/s12874-016-0157-8.

Thompson R, Baker RJ (1981).“Composite link functions in generalized linear models.”Applied Statistics, 125–131.

See Also

Useful links:


PCLM Akaike Information Criterion

Description

Generic function calculating Akaike's ‘An Information Criterion’ forone or several fitted model objects for which a log-likelihood valuecan be obtained, according to the formula-2 \mbox{log-likelihood} + k n_{par},wheren_{par} represents the number of parameters in thefitted model, andk = 2 for the usual AIC, ork = \log(n)(n being the number of observations) for the so-called BIC or SBC(Schwarz's Bayesian criterion).

Usage

## S3 method for class 'pclm'AIC(object, ..., k = 2)

Arguments

object

a fitted model object for which there exists alogLik method to extract the corresponding log-likelihood, oran object inheriting from classlogLik.

...

optionally more fitted model objects.

k

numeric, thepenalty per parameter to be used; thedefaultk = 2 is the classical AIC.

Details

When comparing models fitted by maximum likelihood to the same data,the smaller the AIC or BIC, the better the fit.

The theory of AIC requires that the log-likelihood has been maximized:whereas AIC can be computed for models not fitted by maximumlikelihood, their AIC values should not be compared.

Examples of models not ‘fitted to the same data’ are where theresponse is transformed (accelerated-life models are fitted tolog-times) and where contingency tables have been used to summarizedata.

These are generic functions (with S4 generics defined in packagestats4): however methods should be defined for thelog-likelihood functionlogLik rather than thesefunctions: the action of their default methods is to calllogLikon all the supplied objects and assemble the results. Note that inseveral common caseslogLik does not return the value atthe MLE: see its help page.

The log-likelihood and hence the AIC/BIC is only defined up to anadditive constant. Different constants have conventionally been usedfor different purposes and soextractAIC andAICmay give different values (and do for models of class"lm": seethe help forextractAIC). Particular care is neededwhen comparing fits of different classes (with, for example, acomparison of a Poisson and gamma GLM being meaningless since one hasa discrete response, the other continuous).

BIC is defined asAIC(object, ..., k = log(nobs(object))).This needs the number of observations to be known: the default methodlooks first for a"nobs" attribute on the return value from thelogLik method, then tries thenobsgeneric, and if neither succeed returns BIC asNA.

Value

If just one object is provided, a numeric value with the correspondingAIC (or BIC, or ..., depending onk).

If multiple objects are provided, adata.frame with rowscorresponding to the objects and columns representing the number ofparameters in the model (df) and the AIC or BIC.

References

Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986).Akaike Information Criterion Statistics.D. Reidel Publishing Company.


PCLM-2D Akaike Information Criterion

Description

Generic function calculating Akaike's ‘An Information Criterion’ forone or several fitted model objects for which a log-likelihood valuecan be obtained, according to the formula-2 \mbox{log-likelihood} + k n_{par},wheren_{par} represents the number of parameters in thefitted model, andk = 2 for the usual AIC, ork = \log(n)(n being the number of observations) for the so-called BIC or SBC(Schwarz's Bayesian criterion).

Usage

## S3 method for class 'pclm2D'AIC(object, ..., k = 2)

Arguments

object

a fitted model object for which there exists alogLik method to extract the corresponding log-likelihood, oran object inheriting from classlogLik.

...

optionally more fitted model objects.

k

numeric, thepenalty per parameter to be used; thedefaultk = 2 is the classical AIC.

Details

When comparing models fitted by maximum likelihood to the same data,the smaller the AIC or BIC, the better the fit.

The theory of AIC requires that the log-likelihood has been maximized:whereas AIC can be computed for models not fitted by maximumlikelihood, their AIC values should not be compared.

Examples of models not ‘fitted to the same data’ are where theresponse is transformed (accelerated-life models are fitted tolog-times) and where contingency tables have been used to summarizedata.

These are generic functions (with S4 generics defined in packagestats4): however methods should be defined for thelog-likelihood functionlogLik rather than thesefunctions: the action of their default methods is to calllogLikon all the supplied objects and assemble the results. Note that inseveral common caseslogLik does not return the value atthe MLE: see its help page.

The log-likelihood and hence the AIC/BIC is only defined up to anadditive constant. Different constants have conventionally been usedfor different purposes and soextractAIC andAICmay give different values (and do for models of class"lm": seethe help forextractAIC). Particular care is neededwhen comparing fits of different classes (with, for example, acomparison of a Poisson and gamma GLM being meaningless since one hasa discrete response, the other continuous).

BIC is defined asAIC(object, ..., k = log(nobs(object))).This needs the number of observations to be known: the default methodlooks first for a"nobs" attribute on the return value from thelogLik method, then tries thenobsgeneric, and if neither succeed returns BIC asNA.

Value

If just one object is provided, a numeric value with the correspondingAIC (or BIC, or ..., depending onk).

If multiple objects are provided, adata.frame with rowscorresponding to the objects and columns representing the number ofparameters in the model (df) and the AIC or BIC.

References

Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986).Akaike Information Criterion Statistics.D. Reidel Publishing Company.


PCLM Bayes Information Criterion

Description

Generic function calculating Akaike's ‘An Information Criterion’ forone or several fitted model objects for which a log-likelihood valuecan be obtained, according to the formula-2 \mbox{log-likelihood} + k n_{par},wheren_{par} represents the number of parameters in thefitted model, andk = 2 for the usual AIC, ork = \log(n)(n being the number of observations) for the so-called BIC or SBC(Schwarz's Bayesian criterion).

Usage

## S3 method for class 'pclm'BIC(object, ...)

Arguments

object

a fitted model object for which there exists alogLik method to extract the corresponding log-likelihood, oran object inheriting from classlogLik.

...

optionally more fitted model objects.

Details

When comparing models fitted by maximum likelihood to the same data,the smaller the AIC or BIC, the better the fit.

The theory of AIC requires that the log-likelihood has been maximized:whereas AIC can be computed for models not fitted by maximumlikelihood, their AIC values should not be compared.

Examples of models not ‘fitted to the same data’ are where theresponse is transformed (accelerated-life models are fitted tolog-times) and where contingency tables have been used to summarizedata.

These are generic functions (with S4 generics defined in packagestats4): however methods should be defined for thelog-likelihood functionlogLik rather than thesefunctions: the action of their default methods is to calllogLikon all the supplied objects and assemble the results. Note that inseveral common caseslogLik does not return the value atthe MLE: see its help page.

The log-likelihood and hence the AIC/BIC is only defined up to anadditive constant. Different constants have conventionally been usedfor different purposes and soextractAIC andAICmay give different values (and do for models of class"lm": seethe help forextractAIC). Particular care is neededwhen comparing fits of different classes (with, for example, acomparison of a Poisson and gamma GLM being meaningless since one hasa discrete response, the other continuous).

BIC is defined asAIC(object, ..., k = log(nobs(object))).This needs the number of observations to be known: the default methodlooks first for a"nobs" attribute on the return value from thelogLik method, then tries thenobsgeneric, and if neither succeed returns BIC asNA.

Value

If just one object is provided, a numeric value with the correspondingAIC (or BIC, or ..., depending onk).

If multiple objects are provided, adata.frame with rowscorresponding to the objects and columns representing the number ofparameters in the model (df) and the AIC or BIC.

References

Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986).Akaike Information Criterion Statistics.D. Reidel Publishing Company.


PCLM-2D Akaike Information Criterion

Description

Generic function calculating Akaike's ‘An Information Criterion’ forone or several fitted model objects for which a log-likelihood valuecan be obtained, according to the formula-2 \mbox{log-likelihood} + k n_{par},wheren_{par} represents the number of parameters in thefitted model, andk = 2 for the usual AIC, ork = \log(n)(n being the number of observations) for the so-called BIC or SBC(Schwarz's Bayesian criterion).

Usage

## S3 method for class 'pclm2D'BIC(object, ..., k = 2)

Arguments

object

a fitted model object for which there exists alogLik method to extract the corresponding log-likelihood, oran object inheriting from classlogLik.

...

optionally more fitted model objects.

k

numeric, thepenalty per parameter to be used; thedefaultk = 2 is the classical AIC.

Details

When comparing models fitted by maximum likelihood to the same data,the smaller the AIC or BIC, the better the fit.

The theory of AIC requires that the log-likelihood has been maximized:whereas AIC can be computed for models not fitted by maximumlikelihood, their AIC values should not be compared.

Examples of models not ‘fitted to the same data’ are where theresponse is transformed (accelerated-life models are fitted tolog-times) and where contingency tables have been used to summarizedata.

These are generic functions (with S4 generics defined in packagestats4): however methods should be defined for thelog-likelihood functionlogLik rather than thesefunctions: the action of their default methods is to calllogLikon all the supplied objects and assemble the results. Note that inseveral common caseslogLik does not return the value atthe MLE: see its help page.

The log-likelihood and hence the AIC/BIC is only defined up to anadditive constant. Different constants have conventionally been usedfor different purposes and soextractAIC andAICmay give different values (and do for models of class"lm": seethe help forextractAIC). Particular care is neededwhen comparing fits of different classes (with, for example, acomparison of a Poisson and gamma GLM being meaningless since one hasa discrete response, the other continuous).

BIC is defined asAIC(object, ..., k = log(nobs(object))).This needs the number of observations to be known: the default methodlooks first for a"nobs" attribute on the return value from thelogLik method, then tries thenobsgeneric, and if neither succeed returns BIC asNA.

Value

If just one object is provided, a numeric value with the correspondingAIC (or BIC, or ..., depending onk).

If multiple objects are provided, adata.frame with rowscorresponding to the objects and columns representing the number ofparameters in the model (df) and the AIC or BIC.

References

Sakamoto, Y., Ishiguro, M., and Kitagawa G. (1986).Akaike Information Criterion Statistics.D. Reidel Publishing Company.


Construct B-spline basis

Description

This is an internal function of package MortalitySmooth which creates equally-spaced B-splines basis over an abscissa of data.

Usage

MortSmooth_bbase(x, xl, xr, ndx, deg)

Arguments

x

vector for the abscissa of data;

xl

left boundary;

xr

right boundary;

ndx

number of internal knots minus one or number of internal intervals;

deg

degree of the splines.

Details

The function reproduce an algorithm presented by Eilers and Marx (2010) using differences of truncated power functions (see MortSmooth_tpower). The final matrix has a single B-spline for each of the [ndx + deg] columns. The number of rows is equal to the length of x.

The function differs from bs in the package splines since it automatically constructed B-splines with identical shape. This would allow a simple interpretation of coefficients and application of simple differencing.

Value

A matrix containing equally-spaced B-splines of degree deg along x for each column.

Author(s)

Carlo G Camarda


Truncated p-th Power Function

Description

This is an internal function of package MortalitySmooth which constructs a truncated p-th power function along an abscissa within the function MortSmooth_bbase

Usage

MortSmooth_tpower(x, tt, p)

Arguments

x

for the abscissa of data;

tt

vector of truncation points;

p

degree of the power

Details

Internal function used in MortSmooth_bbase. The vector t contains the knots. The simplest system of truncated power functions uses p = 0 and it consists of step functions with jumps of size 1 at the truncation points t.

Author(s)

Carlo G Camarda


Construct B-spline basisThis is an internal function which constructs B-spline basis to be used in pclm estimation

Description

Construct B-spline basisThis is an internal function which constructs B-spline basis to be used in pclm estimation

Usage

build_B_spline_basis(X, Y, kr, deg, diff, type)

Arguments

X

vector with ages

Y

vector with years

kr

Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. SeeMortSmooth_bbase.

deg

Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

diff

An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2.

type

Type of PCLM model. Options:"1D", "2D" for univariate and two-dimensional model respectively.

See Also

MortSmooth_bbase


Build Composition Matrices

Description

Build Composition Matrices

Usage

build_C_matrix(x, y, nlast, offset, out.step, type)

Arguments

x

Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins[0,5), [5,10) and [10, 15),x will be defined by the vector:c(0, 5, 10).

y

Vector with counts to be ungrouped. It must have the same dimension asx.

nlast

Length of the last interval. In the example abovenlast would be 5.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

type

Type of PCLM model. Options:"1D", "2D" for univariate and two-dimensional model respectively.


Construct Penalty Matrix

Description

Construct Penalty Matrix

Usage

build_P_matrix(BA, BY, lambda, type)

Arguments

BA

B-spline basis object for age axis

BY

B-spline basis object for year axis

lambda

Smoothing parameter to be used in pclm estimation.Iflambda = NA an algorithm will find the optimal values.

type

Type of PCLM model. Options:"1D", "2D" for univariate and two-dimensional model respectively.


Compute Standard Errors and Confidence Intervals

Description

Compute Standard Errors and Confidence Intervals

Usage

compute_standard_errors(B, QmQ, QmQP)

Arguments

B

B-spline basis matrix

QmQ

Object generated by pclm.fit

QmQP

Object generated by pclm.fit


Auxiliary for Controllingpclm Fitting

Description

Auxiliary for Controllingpclm Fitting

Usage

control.pclm(lambda     = NA,             kr         = 2,             deg        = 3,             int.lambda = c(0.1, 1e+5),             diff       = 2,             opt.method = c("BIC", "AIC"),             max.iter   = 1e+3,             tol        = 1e-3)

Arguments

lambda

Smoothing parameter to be used in pclm estimation.Iflambda = NA an algorithm will find the optimal values.

kr

Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. SeeMortSmooth_bbase.

deg

Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

int.lambda

Iflambda is optimized an interval to be searched needs to be specified. Format: vector containing the end-points.

diff

An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2.

opt.method

Selection criterion of the model.Possible values are"AIC" and"BIC". Default:"BIC".

max.iter

Maximal number of iterations used in fitting procedure.

tol

Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin.

Value

A list with exactly eight control parameters.

See Also

pclm

Examples

control.pclm()

Auxiliary for Controllingpclm2D Fitting

Description

Auxiliary for Controllingpclm2D Fitting

Usage

control.pclm2D(lambda     = c(1, 1),               kr         = 7,               deg        = 3,               int.lambda = c(0.1, 1e+3),               diff       = 2,               opt.method = c("BIC", "AIC"),               max.iter   = 1e+3,               tol        = 1e-3)

Arguments

lambda

Smoothing parameter to be used in pclm estimation.Iflambda = NA an algorithm will find the optimal values.

kr

Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. SeeMortSmooth_bbase.

deg

Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

int.lambda

Iflambda is optimized an interval to be searched needs to be specified. Format: vector containing the end-points.

diff

An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2.

opt.method

Selection criterion of the model.Possible values are"AIC" and"BIC". Default:"BIC".

max.iter

Maximal number of iterations used in fitting procedure.

tol

Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin.

Value

A list with exactly eight control parameters.

See Also

pclm2D

Examples

control.pclm2D()

Create an additional bin with a small value at the end. Improves convergence.

Description

Create an additional bin with a small value at the end. Improves convergence.

Usage

create.artificial.bin(i, vy = 1, vo = 1.01)

Arguments

i

A list of input values corresponding to pclm or pclm2D;

vy

Numerical value of the bin created fory input;

vo

Numerical values of the bin created foroffset input (if the case).


Delete from results the last group added artificially in pclm and pclm2D

Description

Delete from results the last group added artificially in pclm and pclm2D

Usage

delete.artificial.bin(M)

Arguments

M

A pclm.fit object


Extract Fractional Part of a Number

Description

Extract Fractional Part of a Number

Usage

frac(x)

Arguments

x

A numeric value, vector or matrix


Map groups and borders

Description

We assume no missing values between the bins

Usage

map.bins(x, nlast, out.step)

Arguments

x

Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins[0,5), [5,10) and [10, 15),x will be defined by the vector:c(0, 5, 10).

nlast

Length of the last interval. In the example abovenlast would be 5.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.


Objective function

Description

Objective function

Usage

ofun(L, I, type)

Arguments

L

lambda.hat

I

Input object from pclm function

type

Type of PCLM model. Options:"1D", "2D" for univariate and two-dimensional model respectively.


Optimize Smoothing ParametersThis function optimize searches oflambda, kr anddeg. Seecontrol.pclm to see what is their meaning. The optimization process works in steps. Simultaneous optimization was tested and found inefficient.

Description

Optimize Smoothing ParametersThis function optimize searches oflambda, kr anddeg. Seecontrol.pclm to see what is their meaning. The optimization process works in steps. Simultaneous optimization was tested and found inefficient.

Usage

optimize_par(I, type)

Arguments

I

Input object from pclm function

type

Type of PCLM model. Options:"1D", "2D" for univariate and two-dimensional model respectively.


Univariate Penalized Composite Link Model (PCLM)

Description

Fit univariate penalized composite link model (PCLM) to ungroup binned count data, e.g. age-at-death distributions grouped in age classes.

Usage

pclm(  x,  y,  nlast,  offset = NULL,  out.step = 1,  ci.level = 95,  verbose = FALSE,  control = list())

Arguments

x

Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins[0,5), [5,10) and [10, 15),x will be defined by the vector:c(0, 5, 10).

y

Vector with counts to be ungrouped. It must have the same dimension asx.

nlast

Length of the last interval. In the example abovenlast would be 5.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

ci.level

Level of significance for computing confidence intervals. Default:95.

verbose

Logical value. Indicates whether a progress bar should be shown or not.Default:FALSE.

control

List with additional parameters:

  • lambda – Smoothing parameter to be used in pclm estimation.Iflambda = NA an algorithm will find the optimal values.

  • kr – Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. SeeMortSmooth_bbase.

  • deg – Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

  • int.lambda – Iflambda is optimized an interval to be searched needs to be specified. Format: vector containing the end-points.

  • diff – An integer indicating the order of differences of the components of PCLM coefficients.

  • opt.method – Selection criterion of the model.Possible values are"AIC" and"BIC".

  • max.iter – Maximal number of iterations used in fitting procedure.

  • tol – Relative tolerance in PCLM fitting procedure.

Details

The PCLM method is based on the composite link model, which extends standard generalized linear models. It implements the idea that the observed counts, interpreted as realizations from Poisson distributions, are indirect observations of a finer (ungrouped) but latent sequence. This latent sequence represents the distribution of expected means on a fine resolution and has to be estimated from the aggregated data. Estimates are obtained by maximizing a penalized likelihood. This maximization is performed efficiently by a version of the iteratively reweighted least-squares algorithm. Optimal values of the smoothing parameter are chosen by minimizing Bayesian or Akaike's Information Criterion.

Value

The output is a list with the following components:

input

A list with arguments provided in input. Saved for convenience.

fitted

The fitted values of the PCLM model.

ci

Confidence intervals around fitted values.

goodness.of.fit

A list containing goodness of fit measures: standard errors, AIC and BIC.

smoothPar

Estimated smoothing parameters:lambda, kr anddeg.

bins.definition

Additional values to identify the bins limits and location in input and output objects.

deep

A list of objects created in the fitting process. Useful in diagnosis of possible issues.

call

An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments.

References

Rizzi S, Gampe J, Eilers PHC (2015).“Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.”American Journal of Epidemiology,182(2), 138-147.doi:10.1093/aje/kwv020.

See Also

control.pclmplot.pclm

Examples

# Data  x <- c(0, 1, seq(5, 85, by = 5))y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998,        1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)offset <- c(114, 440, 509, 492, 628, 618, 576, 580, 634, 657,             631, 584, 573, 619, 530, 384, 303, 245, 249) * 1000nlast <- 26 # the size of the last interval# Example 1 ----------------------M1 <- pclm(x, y, nlast)ls(M1)summary(M1)fitted(M1)plot(M1)# Example 2 ----------------------# ungroup even in smaller intervalsM2 <- pclm(x, y, nlast, out.step = 0.5)head(fitted(M1))plot(M1, type = "s")# Note, in example 1 we are estimating intervals of length 1. In example 2 # we are estimating intervals of length 0.5 using the same aggregate data.# Example 3 ----------------------# Do not optimise smoothing parameters; choose your own. Faster.M3 <- pclm(x, y, nlast, out.step = 0.5,            control = list(lambda = 100, kr = 10, deg = 10))plot(M3)summary(M2)summary(M3) # not the smallest BIC here, but sometimes is not important.# Example 4 -----------------------# Grouped x & grouped offset (estimate death rates)M4 <- pclm(x, y, nlast, offset)plot(M4, type = "s")# Example 5 -----------------------# Grouped x & ungrouped offset (estimate death rates)ungroupped_Ex <- pclm(x, y = offset, nlast, offset = NULL)$fitted # ungroupped offset dataM5 <- pclm(x, y, nlast, offset = ungroupped_Ex)

Compute Confidence Intervals for PCLM output

Description

Compute Confidence Intervals for PCLM output

Usage

pclm.confidence(fit, out.step, y, SE, ci.level, type, offset)

Arguments

fit

Fitted values from pclm.fit

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

y

Vector with counts to be ungrouped. It must have the same dimension asx.

SE

Standard Errors

ci.level

Level of significance for computing confidence intervals. Default:95.

type

Type of PCLM model. Options:"1D", "2D" for univariate and two-dimensional model respectively.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.


Compute Confidence Intervals for estimated distribution

Description

Compute Confidence Intervals for estimated distribution

Usage

pclm.confidence.dx(X)

Arguments

X

List of inputs from pclm.confidence function


Compute Confidence Intervals for estimated hazard

Description

Compute Confidence Intervals for estimated hazard

Usage

pclm.confidence.mx(X)

Arguments

X

List of inputs from pclm.confidence function


Fit PCLM Models

Description

This is an internal function used to estimate PCLM model. It is used bypclm andpclm2D functions

Usage

pclm.fit(  x,  y,  nlast,  offset,  out.step,  verbose,  lambda,  kr,  deg,  diff,  max.iter,  tol,  type)

Arguments

x

Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins[0,5), [5,10) and [10, 15),x will be defined by the vector:c(0, 5, 10).

y

Vector with counts to be ungrouped. It must have the same dimension asx.

nlast

Length of the last interval. In the example abovenlast would be 5.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

verbose

Logical value. Indicates whether a progress bar should be shown or not.Default:FALSE.

lambda

Smoothing parameter to be used in pclm estimation.Iflambda = NA an algorithm will find the optimal values.

kr

Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. SeeMortSmooth_bbase.

deg

Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

diff

An integer indicating the order of differences of the components of PCLM coefficients. Default value: 2.

max.iter

Maximal number of iterations used in fitting procedure.

tol

Relative tolerance in PCLM fitting procedure. Default: 0.1% i.e. the estimated aggregate bins should be in the 0.1% error margin.

type

Type of PCLM model. Options:"1D", "2D" for univariate and two-dimensional model respectively.


Validate input values

Description

Validate input values

Usage

pclm.input.check(X, pclm.type)

Arguments

X

A list with input arguments provided inpclm function


Two-Dimensional Penalized Composite Link Model (PCLM-2D)

Description

Fit two-dimensional penalized composite link model (PCLM-2D), e.g. simultaneous ungrouping of age-at-death distributions grouped in age classes for adjacent years. The PCLM can be extended to a two-dimensional regression problem. This is particularly suitable for mortality analysis when mortality surfaces are to be estimated to capture both age-specific trajectories of coarsely grouped distributions and time trends (Rizzi et al. 2019).

Usage

pclm2D(  x,  y,  nlast,  offset = NULL,  out.step = 1,  ci.level = 95,  verbose = TRUE,  control = list())

Arguments

x

Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins[0,5), [5,10) and [10, 15),x will be defined by the vector:c(0, 5, 10).

y

data.frame with counts to be ungrouped. The number of rows should be equal with the length ofx.

nlast

Length of the last interval. In the example abovenlast would be 5.

offset

Optional offset term to calculate smooth mortality rates. A vector of the same length as x and y. See Rizzi et al. (2015) for further details.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.

ci.level

Level of significance for computing confidence intervals. Default:95.

verbose

Logical value. Indicates whether a progress bar should be shown or not. Default:TRUE.

control

List with additional parameters:

  • lambda – Smoothing parameter to be used in pclm estimation.Iflambda = NA an algorithm will find the optimal values.

  • kr – Knot ratio. Number of internal intervals used for defining 1 knot in B-spline basis construction. SeeMortSmooth_bbase.

  • deg – Degree of the splines needed to create equally-spaced B-splines basis over an abscissa of data.

  • int.lambda – Iflambda is optimized an interval to be searched needs to be specified. Format: vector containing the end-points.

  • diff – An integer indicating the order of differences of the components of PCLM coefficients.

  • opt.method – Selection criterion of the model.Possible values are"AIC" and"BIC".

  • max.iter – Maximal number of iterations used in fitting procedure.

  • tol – Relative tolerance in PCLM fitting procedure.

Value

The output is a list with the following components:

input

A list with arguments provided in input. Saved for convenience.

fitted

The fitted values of the PCLM model.

ci

Confidence intervals around fitted values.

goodness.of.fit

A list containing goodness of fit measures: standard errors, AIC and BIC.

smoothPar

Estimated smoothing parameters:lambda, kr anddeg.

bins.definition

Additional values to identify the bins limits and location in input and output objects.

deep

A list of objects created in the fitting process. Useful in diagnosis of possible issues.

call

An unevaluated function call, that is, an unevaluated expression which consists of the named function applied to the given arguments.

References

Rizzi S, Gampe J, Eilers PHC (2015).“Efficient Estimation of Smooth Distributions From Coarsely Grouped Data.”American Journal of Epidemiology,182(2), 138-147.doi:10.1093/aje/kwv020.

Rizzi S, Halekoh U, Thinggaard M, Engholm G, Christensen N, Johannesen TB, Lindahl-Jacobsen R (2019).“How to estimate mortality trends from grouped vital statistics.”International Journal of Epidemiology,48(2), 571–582.doi:10.1093/ije/dyy183.

See Also

control.pclm2Dplot.pclm2D

Examples

# Input dataDx <- ungroup.data$DxEx <- ungroup.data$Ex# Aggregate data to be ungrouped in the examples below# Select a 10y data framex      <- c(0, 1, seq(5, 85, by = 5))nlast  <- 26n      <- c(diff(x), nlast)group  <- rep(x, n)y      <- aggregate(Dx, by = list(group), FUN = "sum")[, 2:10]offset <- aggregate(Ex, by = list(group), FUN = "sum")[, 2:10]# Example 1 ---------------------- # Fit model and ungroup data using PCLM-2DP1 <- pclm2D(x, y, nlast)summary(P1)# Plot fitted valuesplot(P1)# Plot input dataplot(P1, "observed")# NOTE: pclm2D does not search for optimal smoothing parameters by default# (like pclm does) because it is more time consuming. If optimization is # required set lambda = c(NA, NA):P1 <- pclm2D(x, y, nlast, control = list(lambda = c(NA, NA)))# Example 2 ---------------------- # Ungroup and build a mortality surfaceP2 <- pclm2D(x, y, nlast, offset)summary(P2)plot(P2, type = "observed")plot(P2, type = "fitted")plot(P2, type = "fitted", colors = c("blue", "red"))

Generic Plot for pclm Class

Description

Generic Plot for pclm Class

Usage

## S3 method for class 'pclm'plot(x, xlab, ylab, ylim, type, lwd, col, legend, legend.position, ...)

Arguments

x

An object of classpclm

xlab

a label for the x axis, defaults to a description ofx.

ylab

a label for the y axis, defaults to a description ofy.

ylim

the y limits of the plot.

type

1-character string giving the type of plot desired. The following values are possible, for details, see plot: "p" for points, "l" for lines, "b" for both points and lines, "c" for empty points joined by lines, "o" for overplotted points and lines, "s" and "S" for stair steps and "h" for histogram-like vertical lines. Finally, "n" does not produce any points or lines.

lwd

Line width, a positive number, defaulting to 2.

col

Three colours to be used in the plot for observed values, fitted values and confidence intervals.

legend

a character orexpression vectorof length\ge 1 to appear in the legend. Otherobjects will be coerced byas.graphicsAnnot.

legend.position

Legend position, or the x and y co-ordinates to be used to position the legend.

...

other graphical parameters (seepar for more details).

See Also

pclm

Examples

# See complete examples in pclm help page

Generic Plot for pclm2D Class

Description

The generic plot for apclm2D object is constructed usingpersp method.

Usage

## S3 method for class 'pclm2D'plot(  x,  type = c("fitted", "observed"),  colors = c("#b6e3db", "#e5d9c2", "#b5ba61", "#725428"),  nbcol = 25,  xlab = "x",  ylab = "y",  zlab = "values",  phi = 30,  theta = 210,  border = "grey50",  ticktype = "simple",  ...)

Arguments

x

an object of classpclm2D.

type

chart type. Defines which data are plotted,"fitted" values or"observed" input data. Default:"fitted".

colors

colors to interpolate; must be a valid argument tocol2rgb().

nbcol

dimension of the color palette. Number of colors. Default: 25.

xlab,ylab,zlab

titles for the axes. N.B. These must becharacter strings; expressions are not accepted. Numbers will becoerced to character strings.

theta,phi

angles defining the viewing direction.theta gives the azimuthal direction andphithe colatitude.

border

the color of the line drawn around the surface facets.The default,NULL, corresponds topar("fg").A value ofNA will disable the drawing of borders: this issometimes useful when the surface is shaded.

ticktype

character:"simple" draws just an arrowparallel to the axis to indicate direction of increase;"detailed" draws normal ticks as per 2D plots.

...

any other argument to be passed topersp.

See Also

pclm2D

Examples

# See complete examples in pclm2D help page

Print for pclm method

Description

Print for pclm method

Usage

## S3 method for class 'pclm'print(x, ...)

Arguments

x

An object of class"pclm"

...

further arguments passed to or from other methods.


Print method for pclm2D

Description

Print method for pclm2D

Usage

## S3 method for class 'pclm2D'print(x, ...)

Arguments

x

An object of class"pclm2D"

...

further arguments passed to or from other methods.


Print for summary.pclm method

Description

Print for summary.pclm method

Usage

## S3 method for class 'summary.pclm'print(x, ...)

Arguments

x

An object of class"summary.pclm"

...

further arguments passed to or from other methods.


Print method for summary.pclm2D

Description

Print method for summary.pclm2D

Usage

## S3 method for class 'summary.pclm2D'print(x, ...)

Arguments

x

An object of class"summary.pclm2D"

...

further arguments passed to or from other methods.


Print function forungroup.data

Description

Print function forungroup.data

Usage

## S3 method for class 'ungroup.data'print(x, ...)

Arguments

x

Anungroup.data object

...

further arguments passed to or from other methods.


Extract PCLM Deviance Residuals

Description

Extract PCLM Deviance Residuals

Usage

## S3 method for class 'pclm'residuals(object, ...)

Arguments

object

an object for which the extraction of model residuals ismeaningful.

...

other arguments.

Value

Residuals extracted from the objectobject.

Examples

x <- c(0, 1, seq(5, 85, by = 5))y <- c(294, 66, 32, 44, 170, 284, 287, 293, 361, 600, 998,        1572, 2529, 4637, 6161, 7369, 10481, 15293, 39016)M1 <- pclm(x, y, nlast = 26)residuals(M1)

Extract PCLM-2D Deviance Residuals

Description

Extract PCLM-2D Deviance Residuals

Usage

## S3 method for class 'pclm2D'residuals(object, ...)

Arguments

object

an object for which the extraction of model residuals ismeaningful.

...

other arguments.

Value

Residuals extracted from the objectobject.

Examples

Dx <- ungroup.data$DxEx <- ungroup.data$Ex# Aggregate data to ungroup it in the example belowx      <- c(0, 1, seq(5, 85, by = 5))nlast  <- 26n      <- c(diff(x), nlast)group  <- rep(x, n)y      <- aggregate(Dx, by = list(group), FUN = "sum")[, -1]# ExampleP1 <- pclm2D(x, y, nlast)residuals(P1)

Sequence function with last value

Description

Sequence function with last value

Usage

seqlast(from, to, by)

Arguments

from,to

the starting and (maximal) end values of thesequence. Of length1 unless justfrom is supplied asan unnamed argument.

by

number: increment of the sequence.


Suggest values ofout.step that do not require an adjustment ofnlast

Description

Suggest values ofout.step that do not require an adjustment ofnlast

Usage

suggest.valid.out.step(len, increment = 0.01)

Arguments

len

Interval length

increment

Increment


Summary for pclm method

Description

Summary for pclm method

Usage

## S3 method for class 'pclm'summary(object, ...)

Arguments

object

an object for which a summary is desired.

...

additional arguments affecting the summary produced.


Summary method for pclm2DGeneric function used to produce result summaries of the results produced bypclm2D.

Description

Summary method for pclm2DGeneric function used to produce result summaries of the results produced bypclm2D.

Usage

## S3 method for class 'pclm2D'summary(object, ...)

Arguments

object

an object for which a summary is desired.

...

additional arguments affecting the summary produced.


Test Dataset in the Package

Description

Dataset containing death counts (Dx) and exposures (Ex) by age for a certain population between 1980 and 2014. The data-set is provided for testing purposes only and might be altered and outdated. Download actual demographic data free of charge from Human Mortality Database (2018). Once a username and a password is created on thewebsite theMortalityLaws R package can be used to extract data in R format.

Usage

ungroup.data

Format

An object of classungroup.data of length 2.

Source

Human Mortality Database

References

Human Mortality Database (2018).“University of California, Berkeley (USA), and Max Planck Institute for Demographic Research (Germany). Data downloaded on 17/01/2018.”https://www.mortality.org.

Pascariu MD (2018).MortalityLaws: Parametric Mortality Models, Life Tables and HMD.R package version 1.6.0,https://github.com/mpascariu/MortalityLaws.

See Also

ReadHMD


Check ifnlast needs to be adjusted in order to accommodateout.step

Description

Check ifnlast needs to be adjusted in order to accommodateout.step

Usage

validate.nlast(x, nlast, out.step)

Arguments

x

Vector containing the starting values of the input intervals/bins.For example: if we have 3 bins[0,5), [5,10) and [10, 15),x will be defined by the vector:c(0, 5, 10).

nlast

Length of the last interval. In the example abovenlast would be 5.

out.step

Length of estimated intervals in output. Values between 0.1 and 1 are accepted. Default: 1.


[8]ページ先頭

©2009-2025 Movatter.jp