Movatterモバイル変換

Version:

2.61

Date:

2025-10-06

Title:

Statistical Analysis in Epidemiology

Depends:

R (≥ 3.5.0), utils

Imports:

cmprsk, etm, splines, MASS, survival, plyr, dplyr, Matrix,numDeriv, data.table, zoo, mgcv, magrittr

Suggests:

mstate, nlme, lme4, demography, popEpi, tidyr

Description:

Functions for demographic and epidemiological analysis in the Lexis diagram, i.e. register and cohort follow-up data. In particular representation, manipulation, rate estimation and simulation for multistate data - the Lexis suite of functions, which includes interfaces to 'mstate', 'etm' and 'cmprsk' packages. Contains functions for Age-Period-Cohort and Lee-Carter modeling and a function for interval censored data. Has functions for extracting and manipulating parameter estimates and predicted values (ci.lin and its cousins), as well as a number of epidemiological data sets.

License:

GPL-2

URL:

http://bendixcarstensen.com/Epi/

NeedsCompilation:

yes

Packaged:

2025-10-06 15:00:40 UTC; BCAR0029

Author:

Bendix Carstensen [aut, cre], Martyn Plummer [aut], Esa Laara [ctb], Michael Hills [ctb]

Maintainer:

Bendix Carstensen <b@bxc.dk>

Repository:

CRAN

Date/Publication:

2025-10-07 05:10:08 UTC

Epi: Functions for manipulation and statistical analysis of epidemiological data

Description

Epi has grown out of the course 'Statistical Practise inEpidemiology with R'http://bendixcarstensen.com/SPE/.

The major contributions from this course have been thestat.table function for advanced tabulation and summary,and the functions for representation and theLexisfunction(s) for manipulation of multistate data with multiple timescales.

Details

Click on theIndex link below the line to accessvignettes (tutorial documents) and an alphabetic list of the functionsinEpi.

hivDK: seroconversion in a cohort of Danish men

Description

Data from a survey of HIV-positivity of a cohort of Danishmen followed by regular tests from 1983 to 1989.

Usage

  data(hivDK)

Format

A data frame with 297 observations on the following 7 variables.

id: ID of the person
entry: Date of entry to the study. Date variable.
well: Date last seen seronegative. Date variable.
ill: Date first seen seroconverted. Date variable.
bth: Year of birth minus 1950.
pyr: Annual number of sexual partners.
us: Indicator of wheter the person has visited the USA.

Source

Mads Melbye, Statens Seruminstitut.

References

Becker N.G. and Melbye M.: Use of a log-linear model tocompute the empirical survival curve from interval-censored data,with application to data on tests for HIV-positivity, AustralianJournal of Statistics, 33, 125–133, 1990.

Melbye M., Biggar R.J., Ebbesen P., Sarngadharan M.G., WeissS.H., Gallo R.C. and Blattner W.A.: Seroepidemiology of HTLV-IIIantibody in Danish homosexual men: prevalence, transmission anddisease outcome. British Medical Journal, 289, 573–575, 1984.

Examples

  data(hivDK)  str(hivDK)

The Aalen-Johansen estimator of state probabilities from amultistate`Lexis` object.

Description

The Aalen-Johansen estimator is computed on the basis of aLexis multistate object along a given time scale. Thefunction is a wrapper for thesurvfit.

Usage

## S3 method for class 'Lexis'AaJ(Lx,               formula = ~ 1,             timeScale = 1, ...)

Arguments

Lx

ALexis object. The starting state must be the firstamonglevels(Lx).

formula

A one-sided formula passed on tosurvfit.

timeScale

Character or integer, selecting one of the timescalesof theLexis object.

...

Arguments passed on. Ignored.

Value

An object of classsurvfitms — seesurvfit.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

data(DMlate)str(DMlate)dml <- Lexis(entry = list(Per = dodm,                          Age = dodm-dobth,                        DMdur = 0 ),              exit = list(Per = dox),       exit.status = factor(!is.na(dodth),                            labels = c("DM","Dead")),              data = DMlate )# Cut the follow-up at insulin startdmi <- cutLexis(dml,                cut = dml$doins,          new.state = "Ins",        split.state = TRUE)summary( dmi )ms <- AaJ.Lexis(dmi, timeScale = "DMdur")class(ms)ms$stateshead(ms$pstate)

Births in Denmark by year and month of birth and sex

Description

The number of live births as entered from printed publications fromStatistics Denmark.

Usage

data(B.dk)

Format

A data frame with 1248 observations on the following 4 variables.

year: Year of birth
month: Month of birth
m: Number of male live births
f: Number of female live births

Details

Division of births by month and sex is only available for the years 1957–69and 2002ff. For the remaining period, the total no. births in each month isdivided between the sexes so that the fraction of boys is equal to the overallfraction for the years where the sex information is available.

There is a break in the series at 1920, when Sonderjylland was joined to Denmark.

Source

Statistiske Undersogelser nr. 19: Befolkningsudvikling og sundhedsforhold1901-60, Copenhagen 1966.Befolkningens bevaegelser 1957Befolkningens bevaegelser 1958...Befolkningens bevaegelser 2003Befolkningens bevaegelser 2004Vital Statistics 2005Vital Statistics 2006

Examples

data(B.dk)str(B.dk)attach(B.dk)# Plot the no of births and the M/F-ratiopar(las = 1, mar = c(4,4,2,4))matplot(year + (month - 0.5) / 12, cbind(m, f),        bty = "n", col = c("blue", "red"), lty = 1, lwd = 1, type = "l",        ylim = c(0, 5000), xlab = "Date of birth", ylab = "" )usr <- par()$usrmtext("Monthly no. births in Denmark", side = 3,      at = usr[1],  adj = 0.25, line = 1/1.6)text(usr[1:2] %*% cbind(c(19,1), c(19,1)) / 20,     usr[3:4] %*% cbind(c(1,19), c(2,18)) / 20,     c("Boys","Girls"), col = c("blue","red"), adj = 0 ) lines(year + (month - 0.5) / 12, (m / (m + f) - 0.5) * 30000, lwd = 1)axis(side = 4, at = (seq(0.505, 0.525, 0.005)-0.5) * 30000, labels = NA, tcl = -0.3 )axis(side = 4, at = (50:53 / 100 - 0.5) * 30000, labels = 50:53, tcl = -0.5 )axis(side = 4, at = (0.54 - 0.5) * 30000, labels = "% boys",     tick = FALSE, mgp = c(3,0.1,0))abline(v = 1920, col = gray(0.8))

Clinical status,relapse, metastasis and death in 2982 women with breast cancer.

Description

This dataset is a transformation of the example dataset used by Crowtherand Lambert in their multistate paper.

Usage

data(BrCa)

Format

A data frame with 2982 observations on the following 17 variables:

pid: Person-id; numeric
year: Calendar year of diagnosis
age: Age at diagnosis
meno: Menopausal status; a factor with levelsprepost
size: Tumour size; a factor with levels<=20 mm>20-50 mm>50 mm
grade: Tumour grade; a factor with levels23
nodes: Number of positive lymph nodes, a numeric vector
pr: Progesteron receptor level
pr.tr: Transformed progesteron level
er: Estrogen receptor level
hormon: Hormon therapy at diagnosis; a factor with levelsnoyes
chemo: Chemotherapy treatment; a factor with levelsnoyes
tor: Time of relapse, years since diagnosis
tom: Time of metastasis, years since diagnosis
tod: Time of death, years since diagnosis
tox: Time of exit from study, years since diagnosis
xst: Vital status at exit; a factor with levelsAliveDead

Details

The dataset has been modified to contain the times (since diagnosis) of the events ofinterest, to comply with the usual structure of data.

Source

The original data were extracted from:http://fmwww.bc.edu/repec/bocode/m/multistate_example.dta, thisis modified representation of the same amount of information.

References

The data were used as example in the paper by Crowther andLambert: Parametric multistate survival models: Flexiblemodelling allowing transition-specific distributions with applicationto estimating clinically useful measures of effect differences; StatMed 36 (29), pp 4719-4742, 2017. (No, it is not the paper, just thetitle.)

A parallel analysis using theLexis machinery is availableas:http://bendixcarstensen.com/AdvCoh/papers/bcMS.pdf

Examples

data(BrCa)

Conversion to diabetes

Description

Data from a randomized intervention study ("Addition") where persons withprediabetic conditions are followed up for conversion to diabetes (DM).Conversion dates are interval censored.Original data are not published yet, so id-numbers have been changed andall dates have been randomly perturbed.

Usage

data(DMconv)

Format

A data frame with 1519 observations on the following 6 variables.

id: Person identifier
doe: Date of entry, i.e. first visit.
dlw: Date last seen well, i.e. last visit without DM.
dfi: Date first seen ill, i.e. first visit with DM.
gtol: Glucose tolerance. Factor with levels:1="IFG" (impaired fasting glucose), 2="IGT" (impaired glucose tolerance).
grp: Randomization. Factor with levels:1="Intervention", 2="Control".

Source

Signe Saetre Rasmussen, Steno Diabetes Center. The Addition Study.

Examples

data(DMconv)str(DMconv)head(DMconv)

Epidemiological rates for diabetes in Denmark 1996–2015

Description

Usage

data("DMepi")

Format

A data frame with 4200 observations on the following 8 variables.

sex: a factor with levelsM,F
A: Age class, 0–99
P: Calendar year, 1996–2016
X: Number of new diagnoses of diabetes among persons without diabetes
D.nD: Number of deaths among persons without diabetes
Y.nD: Person-years among persons without diabetes
D.DM: Number of deaths among persons with diabetes
Y.DM: Person-years among persons with diabetes

Details

Based on registers of the Danish population. Only included forillustrative purposes. Cannot be used as scientifically validateddata, since small numbers are randomly permuted between units.

Examples

data(DMepi)# Total deaths and person-years in the Danish populationftable( addmargins( xtabs( cbind( Deaths=D.nD+D.DM,                                    PYrs=Y.nD+Y.DM ) ~ P + sex,                           data=DMepi ),                    2 ),        row.vars = 1 )# Deaths and person-years in the population of diabetes patientsround(ftable( addmargins( xtabs( cbind( Deaths=D.DM,                                    PYrs=Y.DM ) ~ P + sex,                           data=DMepi ),                    2 ),        row.vars = 1 ) )# Model for age-specific incidence rates;inc <- glm( X ~ sex + Ns( A, knots=seq(30,80,10) ) + P,                offset = log(Y.nD),                family = poisson,                  data = DMepi )# Predict for men and women separately in 2010:ndm <- data.frame( sex="M", A=20:90, P=2010, Y.nD=1000 )ndf <- data.frame( sex="F", A=20:90, P=2010, Y.nD=1000 )prM <- ci.pred( inc, ndm )prF <- ci.pred( inc, ndf )matplot( ndm$A, cbind(prM,prF),         type="l", lty=1, lwd=c(3,1,1),         col=rep(c("blue","red"),each=3),         log="y", xlab="Age", ylab="DM incidence per 1000 PY" )# This is a proportional hazards model - add sex-age interactionint <- update( inc, . ~ . + sex:Ns( A, knots=seq(30,80,10) ) )prM <- ci.pred( int, ndm )prF <- ci.pred( int, ndf )matplot( ndm$A, cbind(prM,prF),         type="l", lty=1, lwd=c(3,1,1),         col=rep(c("blue","red"),each=3),         log="y", xlab="Age", ylab="DM incidence per 1000 PY" )# The rate-ratio is teased out using the ci.exp:RRp <- ci.exp( inc, list(ndm,ndf) )RRi <- ci.exp( int, list(ndm,ndf) )# and added to the plotmatlines( ndm$A, cbind(RRi,RRp),          type="l", lty=1, lwd=c(3,1,1), col=gray(rep(c(0.3,0.7),each=3)) )abline(h=1)axis(side=4)mtext( "Male/Female IRR", side=4, line=2 )

The Danish National Diabetes Register.

Description

These two datasets each contain a random sample of 10,000 persons fromthe Danish National Diabetes Register.DMrand is a random samplefrom the register, whereasDMlate is a random sample among thosewith date of diagnosis after 1.1.1995. All dates are radomly jittered byadding a U(-7,7) (days).

Usage

data(DMrand)       data(DMlate)

Format

A data frame with 10000 observations on the following 7 variables.

sex: Sex, a factor with levelsMF
dobth: Date of birth
dodm: Date of inclusion in the register
dodth: Date of death
dooad: Date of 2nd prescription of OAD
doins: Date of 2nd insulin prescription
dox: Date of exit from follow-up.

Details

All dates are given in fractions of years, so 1998.000corresponds to 1 January 1998 and 1998.997 to 31 December 1998.

All dates are randomly perturbed by a small amount, so no realpersons have any of the combinations of the 6 dates in thedataset. But results derived from the data will be quite close tothose that would be obtained if the entire 'real' diabetes registerwere used.

Source

Danish National Board of Health.

References

B Carstensen, JK Kristensen, P Ottosen and K Borch-Johnsen:The Danish National Diabetes Register: Trends in incidence, prevalence andmortality, Diabetologia, 51, pp 2187–2196, 2008.

In partucular see the appendix at the end of the paper.

Examples

data(DMlate)str(DMlate)dml <- Lexis( entry=list(Per=dodm, Age=dodm-dobth, DMdur=0 ),               exit=list(Per=dox),        exit.status=factor(!is.na(dodth),labels=c("DM","Dead")),               data=DMlate )# Cut the follow-up at insulin start, and introduce a new timescale,# and split non-precursor statessystem.time(dmi <- cutLexis( dml, cut = dml$doins,                      pre = "DM",                new.state = "Ins",                new.scale = "t.Ins",             split.states = TRUE ) )summary( dmi )

Fits a regression model to interval censored data.

Description

The models fitted assumes a piecewise constant baseline rate inintervals specified by the argumentbreaks, and for thecovariates either a multiplicative relative risk function (default) oran additive excess risk function.

Usage

  Icens(first.well, last.well, first.ill,        formula, model.type = c("MRR", "AER"), breaks,        boot = FALSE, alpha = 0.05, keep.sample = FALSE,        data)## S3 method for class 'Icens'summary(object, scale = 1, ...)## S3 method for class 'Icens'print(x, scale = 1, digits = 4, ...)

Arguments

first.well

Time of entry to the study, i.e. the time first seenwithout event. Numerical vector.

last.well

Time last seen without event. Numerical vector.

first.ill

Time first seen with event. Numerical vector.

formula

Model formula for the log relative risk.

model.type

Which model should be fitted.

breaks

Breakpoints between intervals in which the underlyingtimescale is assumed constant. Any observation outside the range ofbreaks is discarded.

boot

Should bootstrap be performed to produce confidenceintervals for parameters. If a number is given this will be thenumber of bootsrap samples. The default is 1000.

alpha

1 minus the confidence level.

keep.sample

Should the bootstrap sample of the parameter valuesbe returned?

data

Data frame in which the times and formula areinterpreted.

object

anIcens object.

x

anIcens object.

scale

scaling factor for rates.

digits

how many digits is used for printing results.

...

Other parameters passed on.

Details

The model is fitted by calling eitherfit.mult orfit.add.

Value

An object of class"Icens": a list with three components:

rates

A glm object from a binomial model with log-link,estimating the baseline rates, and the excess risk if"AER"is specfied.

cov

A glm object from a binomial model with complementarylog-log link, estimating the log-rate-ratios. Only if"MRR"is specfied.

niter

Nuber of iterations, a scalar

boot.ci

Ifboot=TRUE, a 3-column matrix with estimatesand 1-alpha confidence intervals for the parameters in the model.

sample

A matrix of the parameterestimates from thebootstrapping. Rows refer to parameters, columns to bootstrap samples.

Author(s)

Martyn Plummer,martyn.plummer@r-project.org,Bendix Carstensen,b@bxc.dk

References

B Carstensen: Regression models for interval censoredsurvival data: application to HIV infection in Danish homosexualmen. Statistics in Medicine, 15(20):2177-2189, 1996.

CP Farrington: Interval censored survival data: a generalized linearmodelling approach. Statistics in Medicine, 15(3):283-292, 1996.

Examples

data( hivDK )# Convert the dates to fractional years so that rates are# expressed in cases per yearfor( i in 2:4 ) hivDK[,i] <- cal.yr( hivDK[,i] )m.RR <- Icens( entry, well, ill,               model="MRR", formula=~pyr+us, breaks=seq(1980,1990,5),               data=hivDK)# Currently the MRR model returns a list with 2 glm objects.round( ci.lin( m.RR$rates ), 4 )round( ci.lin( m.RR$cov, Exp=TRUE ), 4 )# There is actually a print method:print( m.RR )m.ER <- Icens( entry, well, ill,               model="AER", formula=~pyr+us, breaks=seq(1980,1990,5),               data=hivDK)# There is actually a print method:print( m.ER )

Fit Lee-Carter-type models for rates to arbitrarily shaped observationsof rates in a Lexis diagram.

Description

The Lee-Carter model is originally defined as a model for ratesobserved in A-sets (age by period) of a Lexis diagram, aslog(rate(x,t)) = a(x) + b(x)k(t), using one parameter per age(x) andperiod(t). This function uses natural splines for a(), b() and k(),placing knots for each effect such that the number of events is thesame between knots.

Usage

LCa.fit( data, A, P, D, Y,         model = "APa",    # or one of "ACa", "APaC", "APCa" or "APaCa"          a.ref,            # age reference for the interactions        pi.ref = a.ref,    # age reference for the period interaction        ci.ref = a.ref,    # age reference for the cohort interaction         p.ref,            # period reference for the interaction         c.ref,            # cohort reference for the interactions          npar = c(a = 6,  # no. knots for main age-effect                   p = 6,  # no. knots for period-effect                   c = 6,  # no. knots for cohort-effect                  pi = 6,  # no. knots for age in the period interaction                  ci = 6), # no. knots for age in the cohort interaction            VC = TRUE,     # numerical calculation of the Hessian?         alpha = 0.05,     # 1 minus confidence level           eps = 1e-6,     # convergence criterion         maxit = 100,      # max. no iterations         quiet = TRUE )    # cut the crap## S3 method for class 'LCa'print( x, ... )## S3 method for class 'LCa'summary( object, show.est=FALSE, ... )## S3 method for class 'LCa'plot( x, ... )## S3 method for class 'LCa'predict( object, newdata,                        alpha = 0.05,                        level = 1-alpha,                          sim = ( "vcov" %in% names(object) ),                          ... )

Arguments

data

A data frame. Must have columnsA(age),P(period, that iscalendar time),D(no. of events) andY(person-time,exposure). Alternatively these four quantities can be given asseparate vectors:

A

Vector of ages (midpoint of observation).

P

Vector of period (midpoint of observation).

D

Vector of no. of events.

Y

Vector of person-time. Demographers would say "exposure", bewildering epidemiologists.

a.ref

Reference age for the age-interaction term(s)pi(x) and/orpi(x), wherepi(a.ref)=1 andci(a.ref)=1.

pi.ref

Same, but specifically for the interaction with period.

ci.ref

Same, but specifically for the interaction with cohort.

p.ref

Reference period for the time-interaction termkp(t) wherekp(p.ref)=0.

c.ref

Reference period for the time-interaction termkp(t) wherekc(c.ref)=0.

model

Character, either"APa" which is the classical Lee-Carter modelfor log-rates, other possibilities are"ACa","APCa","APaC" or"APaCa", see details.

npar

A (possibly named) vector or list, with either the number of knots orthe actual vectors of knots for each term. If unnamed, components aretaken to be in the order (a,b,t), if the model is "APaCa" in the order(a,p,c,pi,ci). If a vector, the three integers indicate the number ofknots for each term; these will be placed so that there is an equalnumber of events (D) between each, and half as many below thefirst and above the last knot. Ifnpar is a list of scalars thebehavior is the same. Ifnpar is a list of vectors, these aretaken as the knots for the natural splines. See details for namingconvention.

VC

Logical. Should the variance-covariance matrix of the parameters becomputed by numerical differentiation? See details.

alpha

1 minus the confidence level used when calculatingconfidence intervals for estimates inLCa.fit and forpredictions bypredict.LCa.

eps

Convergence criterion for the deviance, we use the the relativedifference between deviance from the two models fitted.

maxit

Maximal number of iterations.

quiet

Shall I shut up or talk extensively to you about iteration progression etc.?

object

AnLCa object, see under "Value".

show.est

Logical. Should the estimates be printed?

x

AnLCa object, see under "Value".

newdata

Prediction data frame, must have columnsA andP. AnyY column is ignored, predictions are given inunits of theY supplied for the call that generated theLCa object.

level

Confidence level.

sim

Logical or numeric. IfTRUE, prediction c.i.s will bebased on 1000 simulations from the posterior parameters. If numeric,it will be based on that number of simulations.

...

Additional parameters passed on to the method.

Details

The Lee-Carter model is non-linear in age and time so does not fitin the classical glm-Poisson framework. But for fixedb(x) itis a glm, and also for fixeda(x),k(t). The functionalternately fits the two versions until the same fit is produced (samedeviance).

The multiplicative age by period term could equally well have been amultiplicative age by cohort or even both. Thus the most extensivemodel has 5 continuous functions:

\log(\lambda(a,p)) = f(a) + b_p(a)k_p(p) + b_c(a)k_c(p-a)

Each of these is fitted by a natural spline, with knots placed at thequantiles of the events along the age (a), calendar time (p) respectivecohort (p-a) scales. Alternatively the knots can be specified explicitlyin the argumentnpar as a named list, wherea refers tof(a),p refers tok_p(p),c refers tok_c(p-a),pi (periodinteraction) refers tob_p(a)andci (cohortinteraction) refers tob_c(p-a).

The naming convention for the models is a capitalP and/orC if the effect is in the model followed by a lower casea if there is an interaction with age. Thus there are 5 differentmodels that can be fitted:APa,ACa,APaCAPCaandAPaCa.

The standard errors of the parameters from the two separate model fitsin the iterations are however wrong; they are conditional on a subsetof the parameters having a fixed value. However, analytic calculationof the Hessian is a bit of a nightmare, so this is done numericallyusing thehessian function from thenumDeriv package ifVC=TRUE.

The coefficients and the variance-covariance matrix of these are usedinpredict.LCa for a parametric bootstrap (that is, asimulation from a multivariate normal with mean equal to the parameterestimates and variance as the estimated variance-covariance) to getconfidence intervals for the predictions ifsim isTRUE— which it is by default if they are part of the object.

Theplot forLCa objects merely produces between 3 and 5panels showing each of the terms in the model. These are mainly forpreliminary inspection; real reporting of the effects should useproper relative scaling of the effects.

Value

LCa.fit returns an object of classLCa (smootheffectsLee-Carter model); it is a list with thefollowing components:

model

Character, eitherAPa,ACa,APaC,APCa orAPaCa, indicating the variable(s) interactingwith age.

ax

3-column matrix of age-effects, c.i. from the age-timemodel. Row names are the unique occurring ages in thedataset. Estimates are rates.

pi

3-column matrix of age-period interaction effects, c.i. from the agemodel. Row names are the actually occurring ages in thedataset. Estimates are multipliers of the log-RRs inkp,centered at 1 atpi.ref.

kp

3-column matrix of period-effects, with c.i.s from theage-time model. Row names are the actually occurring times in the dataset. Estimates are rate-ratios centered at 1 atp.ref.

ci

3-column matrix of age-cohort interaction effects, c.i. from the agemodel. Row names are the actually occurring ages in thedataset. Estimates are multipliers of the log-RRs inkc,centered at 1 atci.ref.

kc

3-column matrix of cohort-effects, with c.i.s from the age-timemodel. Row names are the actually occurring times in thedataset. Estimates are rate-ratios centered at 1 atc.ref.

mod.at

glm object with the final age-time model — estimatesthe termsax,kp,kc. Givesthe same fit as themod.b model after convergence.

mod.b

glm object with the final age model — estimatesthe termspi,ci. Givesthe same fit as themod.at model after convergence.

coef

All coefficients from both models, in the orderax,kp,kc,pi,ci. Only present ifLCa.fit were called withVC=TRUE (the default).

vcov

Variance-covariance matrix of coefficients from bothmodels, in the same order as in thecoef. Only present ifLCa.fit were called withVC=TRUE.

knots

List of vectors of knots used in for the age, period andcohort effects.

refs

List of reference points used for the age, period andcohort terms in the interactions.

deviance

Deviance of the model

df.residual

Residual degrees of freedom

iter

Number of iterations used to reach convergence.

plot.LCa plots the estimated effects in separate panels,using a log-scale for the baseline rates (ax) and the time-RR(kt). For theAPaCa model 5 panels are plotted.

summary.LCa returns (invisibly) a matrix with the parametersfrom the models and a column of the conditional se.s and additionallyof the se.s derived from the numerically computed Hessian (ifLCa.fit were called withVC=TRUE.)

predict.LCa returns a matrix with one row per row innewdata. IfLCa.fit were called withVC=TRUEthere will be 3 columns, namely prediction (1st column) and c.i.sbased on a simulation of parameters from a multivariate normal withmeancoef and variancevcov using the median andalpha/2 quantiles from thesim simulations. IfLCa.fit were called withVC=FALSE there will be 6columns, namely estimates and c.i.s from age-time model(mod.at), and from the age-interaction model (mod.b),both using conditional variances, and therefore likely with too narrowconfidence limits.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

This function was conceived while teaching a course on APC models atthe Max Planck Institute of Demographic Research (MPIDR,https://www.demogr.mpg.de/en/) in Rostock in May 2016(http://bendixcarstensen.com/APC/MPIDR-2016/), and finishedduring a week long research stay there, kindly sponsored by the MPIDR.

Examples

library( Epi )# Load the testis cancer data by Lexis trianglesdata( testisDK )tc <- subset( testisDK, A>14 & A<60 )head( tc )# We want to see rates per 100,000 PYtc$Y <- tc$Y / 10^5# Fit the Lee-Carter model with age-period interaction (default)LCa.tc <- LCa.fit( tc, model="ACa", a.ref=30, p.ref=1980, quiet=FALSE, eps=10e-4, maxit=50 )LCa.tcsummary( LCa.tc )# Inspect what we gotnames( LCa.tc )# show the estimated effectspar( mfrow=c(1,3) )plot( LCa.tc )# Prediction data frame for ages 15 to 60 for two time points: nd <- data.frame( A=15:60 )# LCa predictionsp70 <- predict.LCa( LCa.tc, newdata=cbind(nd,P=1970), sim=1000 )p90 <- predict.LCa( LCa.tc, newdata=cbind(nd,P=1990), sim=1000 )# Inspect the curves from the parametric bootstrap (simulation):par( mfrow=c(1,1) )head( cbind(p70,p90) )matplot( nd$A, cbind(p70,p90), type="l", lwd=c(6,3,3), lty=c(1,3,3),         col=rep( 2:3, each=3 ), log="y",         ylab="Testis cancer incidence per 100,000 PY in 1970 resp. 1990", xlab="Age" )

Create a Lexis object of follow-up

Description

Create an object of classLexis to represent follow-up inmultiple states on multiple time scales.

Usage

Lexis( entry,        exit,    duration,entry.status = 0, exit.status = 0,          id,        data,       merge = TRUE,      states,       notes = TRUE,         tol = .Machine$double.eps^0.5,keep.dropped = FALSE)## S3 method for class 'Lexis'print(x, ...,                      td = 2,                      nd = td,                    rnam = FALSE,                     org = FALSE)

Arguments

entry

a named list of entry times. Each element of the list isa numeric variable representing the entry time on the named timescale. The name of the elements of the list will appear as names ofvariables designated as timescales in the resulting object. All timescales must have the same units (e.g. years). The names of the timescales must be different from any column name indata.

exit

a named list of exit times.

duration

a numeric vector giving the duration of follow-up.

entry.status

a vector or a factor giving the status atentry

exit.status

a vector or factor giving status at exit. Anychange in status during follow-up is assumed to take place exactlyat the exit time.

id

a vector giving a unique identity value for each personrepresented in the Lexis object. Defaults to1:nrow(data)

data

an optional data frame, list, or environment containingthe variables. If not found indata, the variables aretaken from the environment from whichLexis was called.

merge

a logical flag. IfTRUE then thedataargument will be coerced to a data frame and then merged withthe resultingLexis object.

states

A vector of labels for the states. If given, the statevariableslex.Cst andlex.Xst are returned as factors withidentical levels attributes equal tostates.

notes

Logical. Should notes on entry states and time be given.

tol

Numerical tolerance for follow-up time. Rows with durationless than this value are automatically dropped.

keep.dropped

Logical. Should dropped rows fromdata besaved as an attribute with the object for inspection?

x

ALexis object.

td

Number of digits after the decimal separator used fortimescales andlex.dur when printing

nd

Number of digits after the decimal separator used for othernumerical variables in theLexis object.

rnam

Logical, should row names be printed?

org

Logical, should columns be printed in the original order?

...

Other parameters passed on toprint.data.frame.

Details

The analysis of long-term population-based follow-up studies typicallyrequires multiple time scales to be taken into account, such asage, calendar time, or time since an event. ALexis object isa data frame with additional attributes that allows these multiple timedimensions of follow-up to be managed.

Separate variables for current end exit state allows representation ofmultistate data.

Lexis objects are named after the German demographer WilhelmLexis (1837-1914), who is credited with the invention of the"Lexis diagram" for representing population dynamics simultaneouslyby several timescales in the book"Einleitung in die Theorie der Bevolkerungsstatistik" from 1875.

TheLexis function can create a minimalLexis objectwith only those variables required to define the follow-up history ineach row. Additional variables can be merged into theLexisobject using themerge method forLexis objects. Thelatter is the default.

Theprint method prints the time-scale variables and othernumerical variables rounded, possibly differently. Reorders columns sothe Lexis-specific variables comes first. Returns (invisibly) a charactervector with the (re)ordering of the columns in the object, even iforg = TRUE is set.

There are alsomerge,subset,transform and manyother methods forLexis objects. They work as the correspondingmethods for data-frames but ensures that the result is aLexisobject.

Value

An object of classLexis. This is represented as a data framewith a column for each time scale (with names equal to the union ofthe names ofentry andexit), and additional columns with thefollowing names:

lex.id

Identification of the persons.

lex.dur

Duration of follow-up.

lex.Cst

Entry status (Currentstate),i.e. the state in which the follow up takes place.

lex.Xst

Exit status (eXitstate),i.e. that state taken up afterdur inlex.Cst.

Ifmerge=TRUE (the default) then theLexis object willalso contain all variables from thedata argument.

Note

Only two of the three argumentsentry,exit andduration need to be given. If the third parameter is missing,it is imputed.

entry,exit must be numeric, usingDatevariables will cause some of the utilities to crash. Transformation bycal.yr is recommended.

If only eitherexit orduration are supplied it isassumed thatentry is 0. This is only meaningful (and thereforechecked) if there is only one timescale.

If any ofentry.status orexit.status are of mode character,they will both be converted to factors.

Ifentry.status is not given, then its class is automaticallyset to that ofexit.status. Ifexit.status is acharacter or factor, the value ofentry.status is set to thefirst level. This may be highly undesirable, and therefore noted. Forexample, ifexit.status is character the first level will bethe first in the alphabetical ordering; slightly unfortunate if valuesarec("Well","Diseased"). Ifexit.status is logical, thevalue ofentry.status set toFALSE. Ifexit.status is numeric, the value ofentry.status set to0.

Ifentry.status orexit.status are factors or character,the corresponding state variables in the returnedLexis object,lex.Cst andlex.Xst will be (unordered) factors withidentical set of levels, namely the union of the levels ofentry.status andexit.status.

Author(s)

Martyn Plummer with contributions from Bendix Carstensen

Examples

# A small bogus cohortxcoh <- structure(list( id = c("A", "B", "C"),                     birth = c("14/07/1952", "01/04/1954", "10/06/1987"),                     entry = c("04/08/1965", "08/09/1972", "23/12/1991"),                      exit = c("27/06/1997", "23/05/1995", "24/07/1998"),                      fail = c(1, 0, 1) ),                    .Names = c("id", "birth", "entry", "exit", "fail"),                 row.names = c("1", "2", "3"),                     class = "data.frame")# Convert the character dates into numerical variables (fractional years)xcoh <- cal.yr(xcoh, format="%d/%m/%Y", wh=2:4)# xcoh <- cal.yr(xcoh, format="%d/%m/%Y", wh=2:4)# See how it looksxcohstr( xcoh )# Define a Lexis object with timescales calendar time and ageLcoh <- Lexis(entry = list(per = entry ),               exit = list(per = exit,                           age = exit - birth),        exit.status = fail,               data = xcoh)# Using character states may have undesired effects:xcoh$Fail <- c("Dead","Well","Dead")xcohL1 <- Lexis(entry = list(per = entry),             exit = list(per = exit,                         age = exit - birth),      exit.status = Fail,             data = xcoh)L1# people start being dead!# ...unless you order the levels sensiblyxcoh$Fail <- factor(xcoh$Fail, levels = c("Well", "Dead"))L2 <- Lexis(entry = list(per = entry),             exit = list(per = exit,                         age = exit - birth),      exit.status = Fail,             data = xcoh)L2# behaviour of print method:L2[,1:6]L2[,6:1]print(L2[,6:1], org=TRUE)(print(L2[,-3]))

Plot a Lexis diagram

Description

Draws a Lexis diagram, optionally with life lines from a cohort, andwith lifelines of a cohort if supplied. Intended for presentation purposes.

Usage

Lexis.diagram( age = c( 0, 60),               alab = "Age",              date = c( 1940, 2000 ),              dlab = "Calendar time",               int = 5,           lab.int = 2*int,           col.life = "black",          lwd.life = 2,          age.grid = TRUE,         date.grid = TRUE,          coh.grid = FALSE,          col.grid = gray(0.7),          lwd.grid = 1,                    las = 1,        entry.date = NA,         entry.age = NA,         exit.date = NA,          exit.age = NA,         risk.time = NA,        birth.date = NA,              fail = NA,          cex.fail = 1.1,           pch.fail = c(NA,16),          col.fail = rep( col.life, 2 ),              data = NULL, ... )

Arguments

age

Numerical vector of length 2, giving the age-range for the diagram

alab

Label on the age-axis.

date

Numerical vector of length 2, giving the calendartime-range for the diagram

dlab

label on the calendar time axis.

int

The interval between grid lines in the diagram. If avector of length two is given, the first value will be used forspacing of age-grid and the second for spacing of the date grid.

lab.int

The interval between labelling of the grids.

col.life

Colour of the life lines.

lwd.life

Width of the life lines.

age.grid

Should grid lines be drawn for age?

date.grid

Should grid lines be drawn for date?

coh.grid

Should grid lines be drawn for birth cohorts (diagonals)?

col.grid

Colour of the grid lines.

lwd.grid

Width of the grid lines.

las

How are the axis labels plotted?

entry.date,entry.age,exit.date,exit.age,risk.time,birth.date

Numerical vectors defining lifelines to be plottedin the diagram. At least three must be given to produce lines.Not all subsets of three will suffice, the given subset has to define life lines.If insufficient data is given, no lifelines are produced.

fail

Logical of event status at exit for the persons whose life lines are plotted.

pch.fail

Symbols at the end of the life lines for censorings(fail==0) and failures (fail != 0).

cex.fail

Expansion of the status marks at the end of life lines.

col.fail

Character vector of length 2 giving the colour of thefailure marks for censorings and failures respectively.

data

Dataframe in which to interpret the arguments.

...

Arguments to be passed on to the initial call to plot.

Details

The default unit for supplied variables are (calendar) years.If any of the variablesentry.date,exit.date orbirth.date are of class "Date" or if any of the variablesentry.age,exit.age orrisk.time are of class"difftime", they will be converted to calendar years, and plottedcorrectly in the diagram. The returned dataframe will then have colums ofclasses "Date" and "difftime".

Value

If sufficient information on lifelines is given, a data frame withone row per person and columns with entry ages and dates, birth date,risk time and status filled in.

Side effect: a plot of a Lexis diagram is produced with the life linesin it is produced. This will be the main reason for using thefunction. If the primary aim is to illustrate follow-up of a cohort, thenit is better to represent the follow-up in aLexis object, anduse the genericplot.Lexis function.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

Lexis.diagram( entry.age = c(3,30,45),               risk.time = c(25,5,14),              birth.date = c(1970,1931,1925.7),                    fail = c(TRUE,TRUE,FALSE) )LL <- Lexis.diagram( entry.age = sample( 0:50, 17, replace=TRUE ),                     risk.time = sample( 5:40, 17, r=TRUE),                    birth.date = sample( 1910:1980, 17, r=TRUE ),            fail = sample( 0:1, 17, r=TRUE ),               cex.fail = 1.1,              lwd.life = 2 )# Identify the persons' entry and exitstext( LL$exit.date, LL$exit.age, paste(1:nrow(LL)), col="red", font=2, adj=c(0,1) )text( LL$entry.date, LL$entry.age, paste(1:nrow(LL)), col="blue", font=2, adj=c(1,0) )data( nickel )attach( nickel )LL <- Lexis.diagram( age=c(10,100), date=c(1900,1990),              entry.age=age1st, exit.age=ageout, birth.date=dob,      fail=(icd %in% c(162,163)), lwd.life=1,     cex.fail=0.8, col.fail=c("green","red") )abline( v=1934, col="blue" )nickel[1:10,]LL[1:10,]

Draw life lines in a Lexis diagram.

Description

Add life lines to a Lexis diagram.

Usage

Lexis.lines( entry.date = NA,              exit.date = NA,             birth.date = NA,              entry.age = NA,               exit.age = NA,              risk.time = NA,               col.life = "black",               lwd.life = 2,                   fail = NA,               cex.fail = 1,               pch.fail = c(NA, 16),               col.fail = col.life,                   data = NULL )

Arguments

entry.date,entry.age,exit.date,exit.age,risk.time,birth.date

Numerical vectors defining lifelines to be plottedin the diagram. At least three must be given to produce lines.Not all subsets of three will suffice, the given subset has to define life lines. If insufficient data is given, no lifelines are produced.

col.life

Colour of the life lines.

lwd.life

Width of the life lines.

fail

Logical of event status at exit for the persons whose life lines are plotted.

cex.fail

The size of the status marks at the end of life lines.

pch.fail

The status marks at the end of the life lines.

col.fail

Colour of the marks for censorings and failuresrespectively.

data

Data frame in which to interpret values.

Value

If sufficient information on lifelines is given, a data frame with onerow per person and columns with entry ages and dates, birth date, risktime and status filled in.

Side effect: Life lines are added to an existing Lexisdiagram. Lexis.lines adds life lines to an existing plot.

Author(s)

Bendix Carstensen, Steno Diabetes Center,http://bendixcarstensen.com

Examples

Lexis.diagram( entry.age = c(3,30,45),               risk.time = c(25,5,14),              birth.date = c(1970,1931,1925.7),                    fail = c(TRUE,TRUE,FALSE) )Lexis.lines( entry.age = sample( 0:50, 100, replace=TRUE ),             risk.time = sample( 5:40, 100, r=TRUE),            birth.date = sample( 1910:1980, 100, r=TRUE ),                  fail = sample(0:1,100,r=TRUE),              cex.fail = 0.5,              lwd.life = 1 )

Convert a Lexis obejct to a data set suitable for input to the`msm:msm` function.

Description

The number of records in the resulting dataset will have anumber of records that is normallynrec(Lx) +nid(Lx), that isone extra record for each person. If there are 'holes' in persons'follow-up, each hole will also generate an extra record in the result.

Usage

Lexis2msm(Lx,       state = "state",     verbose = FALSE)

Arguments

Lx

ALexis object.

state

Character; the name of the state variable in the result.

verbose

If true, you will be reminded what the function did.

Value

A data frame of classmsmLexis with the timescales preserved andlex.idpreserved but with otherlex. variables removed.

Has more records than the originalLexis object

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

example(mcutLexis)# we now have the Lexis object L3:summary(L3)# data frame for use with msmmsm3 <- Lexis2msm(L3)# see the difference subset(  L3, lex.id %in% 1:3)subset(msm3, lex.id %in% 1:3)timeScales(msm3)

Compute dates/ages for life lines in a Lexis diagram

Description

Fills out the missing information for follow up of persons in a Lexisdiagram if sufficient information is given.

Usage

Life.lines( entry.date = NA,             exit.date = NA,            birth.date = NA,             entry.age = NA,              exit.age = NA,             risk.time = NA )

Arguments

entry.date,exit.date,birth.date,entry.age,exit.age,risk.time

Vectors defining lifelines to be plotted in the diagram. At least three must be given to produce a result.Not all subsets of three will suffice, the given subset has to define life lines. If insufficient data is given, nothing isreturned and a warning is given.

Value

Data frame with variablesentry.date,entry.age,exit.date,exit.age,risk.time,birth.date, with all entries computed for each person. If anyofentry.date,exit.date orbirth.date are ofclassDate or if any ofentry.age,exit.age orrisk.time are of classdifftime the date variables willbe of classDate and the other three of classdifftime.

Examples

( Life.lines( entry.age = c(3,30,45),              risk.time = c(25,5,14),             birth.date = c(1970,1931,1925.7) ) )# Draw a Lexis diagramLexis.diagram()# Compute entry and exit age and date.( LL <-  Life.lines( entry.age = c(3,30,45),                     risk.time = c(25,5,14),                    birth.date = c(1970,1931,1925.7) ) )segments( LL[,1], LL[,2], LL[,3], LL[,4] ) # Plot the life lines.# Compute entry and exit age and date, supplying a date variablebd <- ( c(1970,1931,1925.7) - 1970 ) * 365.25class( bd ) <- "Date"( Life.lines( entry.age = c(3,30,45),              risk.time = c(25,5,14),             birth.date = bd ) )

Mortality in Denmark 1974 ff.

Description

Mortality in one-year classes of age (0-98,99+) and period (1974 ff.) in Denmark.

Usage

data(M.dk)

Format

A data frame with 6400 observations on the following 6 variables.

A: Age-class, 0-98, 99:99+
sex: Sex. 1:males, 2:females
P: Period (year) of death
D: Number of deaths
Y: Number of person-years
rate: Mortality rate per 1000 person-years

Details

Deaths in ages over 100 are in the class labelled 99. Risk time iscomputed by tabulation of the risk time inY.dk, exceptfor the class 99+ where the average of the population size in ages99+ at the first and last date of the year is used.

Source

http://www.statistikbanken.dk/statbank5a/SelectTable/omrade0.asp?SubjectCode=02&PLanguage=1&ShowNews=OFF

Examples

data(M.dk)str(M.dk)zz <- xtabs( rate ~ sex+A+P, data=M.dk )zz[zz==0] <- NA # 0s makes log-scale plots crashpar(mfrow=c(1,2), mar=c(0,0,0,0), oma=c(3,3,1,1), mgp=c(3,1,0)/1.6 )for( i in 1:2 ){matplot( dimnames(zz)[[2]], zz[i,,],         lty=1, lwd=1, col=rev(heat.colors(37)),         log="y", type="l", ylim=range(zz,na.rm=TRUE),         ylab="", xlab="", yaxt="n" )text( 0, max(zz,na.rm=TRUE), c("M","F")[i], font=2, adj=0:1, cex=2, col="gray" )if( i==1 ) axis( side=2, las=1 )}mtext( side=1, "Age", line=2, outer=TRUE )mtext( side=2, "Mortality rate", line=2, outer=TRUE )

Population size in Denmark

Description

The population size at 1st January in ages 0-99.

Usage

data(N.dk)

Format

A data frame with 7200 observations on the following 4 variables.

sex: Sex, 1:males, 2:females
A: Age. 0:0, 1:1, ..., 98:98, 99:99+
P: Year
N: Number of persons alive at 1st January yearP

Source

http://www.statistikbanken.dk/statbank5a/SelectTable/omrade0.asp?SubjectCode=02&PLanguage=1&ShowNews=OFF

Examples

data(N.dk)str(N.dk)with(N.dk,addmargins(tapply(N,list(P,sex),sum),2))with(subset(N.dk,P==max(P)),addmargins(tapply(N,list(A,sex),sum)))

Create risk time ("Person-Years") in Lexis triangles from population count data.

Description

Data on population size at equidistant dates and age-classes areused to estimate person-time at risk in Lexis-triangles, i.e. classesclassified by age, period AND cohort (date of birth). Only works fordata where age-classes have the same width as the period-intervals.

Usage

N2Y( A, P, N,     data = NULL,     return.dfr = TRUE)

Arguments

A

Name of the age-variable, which should be numeric,corresponding to the left endpoints of the age intervals.

P

Name of the period-variable, which should be numeric,corresponding to the date of population count.

N

The population size at dateP in age classA.

data

A data frame in which arguments are interpreted.

return.dfr

Logical. Should the results be returned as a data frame(defaultTRUE) or as a table.

Details

The calculation of the risk time from the population figures isdone as described in: B. Carstensen: Age-Period-Cohort models for theLexis diagram. Statistics in Medicine, 26: 3018-3045, 2007.

The number of periods in the result is one less than the numberof dates (nP=length(table(P))) in the input, so the number ofdistinct values is2*(nP-1), because theP in the outputis coded differently for upper and lower Lexis triangles.

The number of age-classes is the same as in the input(nA=length(table(A))), so the number of distinct values is2*nA, because theA in the output is coded differentlyfor upper and lower Lexis triangles.

In the paper "Age-Period-Cohort models for the Lexis diagram" Isuggest that the risk time in the lower triangles in the firstage-class and in the upper triangles in the last age-class arecomputed so that the total risk time in the age-class corresponds tothe average of the two population figures for the age-class at eitherend of a period multiplied with the period length. This is the methodused.

Value

A data frame with variablesA,P andY,representing the mean age and period in the Lexis triangles and theperson-time in them, respectively. The person-time is in units of thedistance between population count dates.

Ifreturn.dfr=FALSE a three-way table classified by the left endpoint of the age-classes and the periods and a factorwh takingthe valuesup andlo corresponding to upper (earlycohort) and lower (late cohort) Lexis triangles.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

References

B. Carstensen: Age-Period-Cohort models for theLexis diagram. Statistics in Medicine, 26: 3018-3045, 2007.

Examples

# Danish population at 1 Jan each year by sex and agedata( N.dk )# An illustrative subset( Nx <- subset( N.dk, sex==1 & A<5 & P<1975 ) )# Show the data in tabular formxtabs( N ~ A + P, data=Nx )# Lexis triangles as data frameNt <- N2Y( data=Nx, return.dfr=TRUE )xtabs( Y ~ round(A,2) + round(P,2), data=Nt )# Lexis triangles as a 3-dim arrayftable( N2Y( data=Nx, return.dfr=FALSE ) )# Calculation of PY for persons born 1970 in 1972( N.1.1972 <- subset( Nx, A==1 & P==1972)$N )( N.2.1973 <- subset( Nx, A==2 & P==1973)$N )N.1.1972/3 + N.2.1973/6N.1.1972/6 + N.2.1973/3# These numbers can be found in the following plot:# Blue numbers are population size at 1 January# Red numbers are the computed person-years in Lexis triangles:Lexis.diagram(age=c(0,5), date=c(1970,1975), int=1, coh.grid=TRUE )with( Nx, text(P,A+0.5,paste(N),srt=90,col="blue") )with( Nt, text(P,A,formatC(Y,format="f",digits=1),col="red") )text( 1970.5, 2, "Population count 1 January", srt=90, col="blue")text( 1974.5, 2, "Person-\nyears", col="red")

Set up an array of NAs, solely from the list of dimnames

Description

Defines an array of NAs, solely from the list of dimnames

Usage

NArray( x, cells=NA )ZArray( x, cells=0 )

Arguments

x

A (possibly named) list to be used as dimnames for theresulting array

cells

Value(s) to fill the array

Details

This is a simple useful way of defining arrays to be used forcollection of results. The point is that everything is defined fromthe named list, so in the process of defining what you want tocollect, there is only one place in the program to edit. It's just awrapper forarray.ZArray is just a wrapper forNArray with a different default.

Value

An array withdimnames attributex, and all valuesequal tocells.

Author(s)

Bendix Carstensen

Examples

ftable(NArray( list(Aye = c("Yes", "Si", "Oui"),             Bee = c("Hum", "Buzz"),             Sea = c("White", "Black", "Red", "Dead") ) ) )

Natural splines - (cubic splines linear beyond outermost knots) withconvenient specification of knots and possibility of centering,detrending and clamping.

Description

This function is partly for convenient specification of natural splinesin practical modeling. The convention used is to take the smallestand the largest of the supplied knots as boundary knots. It also hasthe option of centering the effects provided at a chosen referencepoint as well as projecting the columns on the orthogonal space tothat spanned by the intercept and the linear effect of the variable,and finally fixing slopes beyond boundary knots (clamping).

Usage

Ns( x, ref = NULL, df = NULL,                knots = NULL,            intercept = FALSE,       Boundary.knots = NULL,                fixsl = c(FALSE,FALSE),              detrend = FALSE )

Arguments

x

A variable.

ref

Scalar. Reference point on thex-scale, where theresulting effect will be 0.

df

degrees of freedom.

knots

knots to be used both as boundary and internal knots. IfBoundary.knots are given, this will be taken as the set ofinternal knots.

intercept

Should the intercept be included in the resultingbasis? Ignored if any ofref ordetrend is given.

Boundary.knots

The boundary knots beyond which the spline islinear. Defaults to the minimum and maximum ofknots.

fixsl

Specification of whether slopes beyond outer knots shouldbe fixed to 0.FALSE correponds to no restriction; a curve with 0slope beyond the upper knot is obtained usingc(FALSE,TRUE). Ignored if!(detrend==FALSE).

detrend

IfTRUE, the columns of the spline basis will beprojected to the orthogonal ofcbind(1,x). Optionallydetrend can be given as a vector of non-negative numbers oglengthlength(x), usedto define an inner product asdiag(detrend) for projection onthe orthogonal tocbind(1,x). The default is projectionw.r.t. the inner product defined by the identity matrix.

Value

A matrix of dimension c(length(x),df) where eitherdf wassupplied or ifknots were supplied,df = length(knots) - 1 + intercept.Ns returns a spline basis which is centered atref.Ns with the argumentdetrend=TRUE returns aspline basis which is orthogonal tocbind(1,x) with respect tothe inner product defined by the positive definite matrixdiag(detrend) (an assumption which is checked). Note the latteris data dependent and therefore making predictionswith anewdata argument will be senseless.

Note

The need for this function is primarily from analysis of rates inepidemiology and demography, where the dataset are time-split recordsof follow-up, and the range of data therefore rarely is of anyinterest (let alone meaningful).

In Poisson modeling of rates based on time-split records one shouldaim at having the same number ofevents between knots, ratherthan the same number of observations.

Author(s)

Bendix Carstensenb@bxc.dk,Lars Jorge D\'iaz, Steno Diabetes Center Copenhagen.

Examples

require(splines)require(stats)require(graphics)ns( women$height, df = 3)Ns( women$height, knots=c(63,59,71,67) )# Gives the same results as ns:summary( lm(weight ~ ns(height, df = 3), data = women) )summary( lm(weight ~ Ns(height, df = 3), data = women) )# Get the diabetes data and set up as Lexis objectdata(DMlate)DMlate <- DMlate[sample(1:nrow(DMlate),500),]dml <- Lexis( entry = list(Per=dodm, Age=dodm-dobth, DMdur=0 ),               exit = list(Per=dox),        exit.status = factor(!is.na(dodth),labels=c("DM","Dead")),               data = DMlate )# Split follow-up in 1-year age intervalsdms <- splitLexis( dml, time.scale="Age", breaks=0:100 )summary( dms )# Model  age-specific rates using Ns with 6 knots# and period-specific RRs around 2000 with 4 knots# with the same number of deaths between each pair of knotsn.kn <- 6( a.kn <- with( subset(dms,lex.Xst=="Dead"),                quantile( Age+lex.dur, probs=(1:n.kn-0.5)/n.kn ) ) )n.kn <- 4( p.kn <- with( subset( dms, lex.Xst=="Dead" ),                quantile( Per+lex.dur, probs=(1:n.kn-0.5)/n.kn ) ) )m1 <- glm( lex.Xst=="Dead" ~ Ns( Age, kn=a.kn ) +                             Ns( Per, kn=p.kn, ref=2000 ),           offset = log( lex.dur ),           family = poisson,             data = dms )# Plot estimated age-mortality curve for the year 2005 and knots chosen:nd <- data.frame( Age=seq(40,100,0.1), Per=2005, lex.dur=1000 )par( mfrow=c(1,2) )matplot( nd$Age, ci.pred( m1, newdata=nd ),         type="l", lwd=c(3,1,1), lty=1, col="black", log="y",         ylab="Mortality rates per 1000 PY", xlab="Age (years)", las=1, ylim=c(1,1000) )rug( a.kn, lwd=2 )# Clamped Age effect to the right of rightmost knot.m1.c <- glm( lex.Xst=="Dead" ~ Ns( Age, kn=a.kn, fixsl=c(FALSE,TRUE) ) +                               Ns( Per, kn=p.kn, ref=2000 ),             offset = log( lex.dur ),             family = poisson,               data = dms )# Plot estimated age-mortality curve for the year 2005 and knots chosen.matplot( nd$Age, ci.pred( m1.c, newdata=nd ),         type="l", lwd=c(3,1,1), lty=1, col="black", log="y",         ylab="Mortality rates per 1000 PY", xlab="Age (years)", las=1, ylim=c(1,1000) )rug( a.kn, lwd=2 )par( mfrow=c(1,1) )# Including a linear Age effect of 0.05 to the right of rightmost knot.m1.l <- glm( lex.Xst=="Dead" ~ Ns( Age, kn=a.kn, fixsl=c(FALSE,TRUE) ) +                               Ns( Per, kn=p.kn, ref=2000 ),             offset = log( lex.dur ) + pmax( Age, max( a.kn ) ) * 0.05,             family = poisson,               data = dms )# Plot estimated age-mortality curve for the year 2005 and knots chosen.nd <- data.frame(Age=40:100,Per=2005,lex.dur=1000)matplot( nd$Age, ci.pred( m1.l, newdata=nd ),         type="l", lwd=c(3,1,1), lty=1, col="black", log="y",         ylab="Mortality rates per 1000 PY", xlab="Age (years)", las=1, ylim=c(1,1000) )rug( a.kn, lwd=2 )

Function to compute and draw ROC-curves.

Description

Computes sensitivity, specificity and positive and negative predictivevalues for a test based on dichotomizing along the variabletest, for prediction ofstat. Plots curves of these and a ROC-curve.

Usage

ROC( test = NULL,     stat = NULL,     form = NULL,     plot = c("sp", "ROC"),       PS = is.null(test),       PV = TRUE,       MX = TRUE,       MI = TRUE,      AUC = TRUE,     grid = seq(0,100,10), col.grid = gray( 0.9 ),     cuts = NULL,      lwd = 2,     data = parent.frame(),      ... )

Arguments

test

Numerical variable used for prediction.

stat

Logical variable of true status.

form

Formula used in a logistic regression. If this is given,test andstat are ignored. If not given thenbothtest andstat must be supplied.

plot

Character variable. If "sp", the a plot of sensitivity,specificity and predictive values against test is produced, if "ROC" aROC-curve is plotted. Both may be given.

PS

logical, if TRUE the x-axis in theplot "ps"-plot is the the predicted probability forstat==TRUE, otherwise it is the scale oftest if thisis given otherwise the scale of the linear predictor from thelogistic regression.

PV

Should sensitivity, specificity andpredictive values at the optimal cutpoint be given on the ROC plot?

MX

Should the “optimal cutpoint” (i.e. where sens+spec ismaximal) be indicated on the ROC curve?

MI

Should model summary from the logisticregression model be printed in the plot?

AUC

Should the area under the curve (AUC) be printed in the ROCplot?

grid

Numeric or logical. If FALSE no background grid isdrawn. Otherwise a grid is drawn on both axes atgrid percent.

col.grid

Colour of the grid lines drawn.

cuts

Points on the test-scale to be annotated on theROC-curve.

lwd

Thickness of the curves

data

Data frame in which to interpret the variables.

...

Additional arguments for the plotting of theROC-curve. Passed on toplot

Details

As an alternative to atest and astatus variable, amodel formula may given, in which case the the linear predictor is thetest variable and the response is taken as the true status variable.The test used to derive sensitivity, specificity, PV+ and PV- as afunction ofx istest\geq x as a predictor ofstat=TRUE.

Value

A list with two components:

res

dataframe with variablessens,spec,pvp,pvn and name of the test variable. The latter isthe unique values of test or linear predictor from the logisticregression in ascending order with -Inf prepended. Since thesensitivity is defined asP(test>x)|status=TRUE, the first rowhassens equal to 1 andspec equal to 0, correspondingto drawing the ROC curve from the upper right to the lower left corner.

lr

glm object with the logistic regression result used forconstruction of the ROC curve

0, 1 or 2 plots are produced according to the setting ofplot.

Author(s)

Bendix Carstensen, Steno Diabetes Center Copenhagen,http://bendixcarstensen.com

Examples

x <- rnorm( 100 )z <- rnorm( 100 )w <- rnorm( 100 )tigol <- function( x ) 1 - ( 1 + exp( x ) )^(-1)y <- rbinom( 100, 1, tigol( 0.3 + 3*x + 5*z + 7*w ) )ROC( form = y ~ x + z, plot="ROC" )

Reorder and combine levels of a factor

Description

The levels of a factor are re-ordered so that the levels specified byref appear first and remaining levels are moved down. This isuseful forcontr.treatment contrasts which take the first levelas the reference. Factor levels may also be combined; two possibilities forspecifying this are supported: hard coding or table look-up.

Usage

## S3 method for class 'factor'Relevel( x, ref, first = TRUE, collapse="+",                 xlevels=TRUE, nogroup=TRUE, ... )

Arguments

x

A(n unordered) factor

ref

Vector, list or data frame, array, matrix or table.

Ifref is a vector (integer or character), it is assumed itcontains the names or numbers of levels to be the first ones; nonmentioned levels are kept.

Ifref is a list (but not a data frame), factor levelsmentioned in each list element are combined. If the list is namedthe names are used as new factor levels, otherwise new level namesare constructed from the old.

Ifref is a data frame or 2-dimensional array, matrix ortable, the first column is assumed to have unique levels ofxand the second to have groupings of this, respectively.

first

Should the levels mentioned inref (if it is alist) come before those not?

collapse

String used when constructing names for combinedfactor levels.

xlevels

Logical. Should all levels in the 2nd column ofref be maintained as levels of the result, or (ifFALSE) only the actually occurring.

nogroup

Logical. Should levels present in the input but not inthe 1st column ofref be maintained as levels after thegrouping? IfFALSE, NAs will be returned for such elements.

...

Arguments passed on to other methods.

Details

The facility whereref is a two-column matrix mimics theSAS-facility of formats where a dataset can be used to construct aformat — SAS format is the grouping tool for variablevalues.

Ifref is a two-column object andref[,2] is a factorRelevel will preserve the order of levels fromref[,2].

Value

An unordered factor, where levels ofx have been reorderedand/or collapsed.

Author(s)

Bendix Carstensenhttp://bendixcarstensen.com, Lars Jorge Diaz

Examples

# Grouping using a list (hard coding)#ff <- factor(sample(letters[1:5], 100, replace = TRUE))table( ff, Relevel(ff, list( AB = 1:2, "Dee" = 4, c(3,5))))table( ff, Relevel(ff,                   list( 5:4, Z = c("c", "a") ),                   coll = "-und-",                   first = FALSE ) )## Grouping using a two-column matrix as input:## A factor with levels to be grouped togetherff <- factor(c("Bear","Bear","Crocodile","Snake","Crocodile","Bear"))ff## A grouping table(gg <- data.frame(Animal = c("Bear","Whale","Crocodile","Snake","Eagle"),                   Class = c("Mammal","Mammal","Reptile","Reptile","Bird")))str(gg)Relevel(ff, gg, xlevels = FALSE)Relevel(ff, gg )Relevel(ff, gg[c(1:5,5:1),])## This crashes with an error(GG <- rbind( gg, c("Bear","Reptile")))try(Relevel(ff, GG))ff <- factor(c(as.character(ff), "Jellyfish", "Spider"))Relevel(ff, gg)# excludes non-occupied levelsRelevel(ff, gg, xlevels = FALSE)# If you do not want unknown animals classified, this returns NAs:Relevel(ff, gg, nogroup = FALSE)# BothRelevel(ff, gg, nogroup = FALSE, xlevels = FALSE)

Salmonella Typhimurium outbreak 1996 in Denmark.

Description

Matched case-control study of food poisoning.

Format

A data frame with 136 observations on the following 15 variables:

`id`:	Person identification
`set`:	Matched set indicator
`case`:	Case-control status (1:case, 0:control
`age`:	Age of individual
`sex`:	Sex of individual (1:male, 2:female)
`abroad`:	Within the last two weeks visited abroad (1:yes, 0:no)
`beef`:	Within the last two weeks eaten beef
`pork`:	Within the last two weeks eaten pork
`veal`:	Within the last two weeks eaten veal
`poultry`:	Within the last two weeks eaten poultry
`liverp`:	Within the last two weeks eaten liverpaste
`veg`:	Within the last two weeks eaten vegetables
`fruit`:	Within the last two weeks eaten fruit
`egg`:	Within the last two weeks eaten eggs
`plant7`:	Within the last two weeks eaten meat from plant no. 7

Details

In the fall of 1996 an unusually large number of SalmonellaTyphimurium cases were recorded in Fyn county in Denmark. The DanishZoonosis Centre set up a matched case-control study to find thesources. Cases and two age-, sex- and residency-matched controls weretelephone interviewed about their food intake during the last twoweeks.

The participants were asked at which retailer(s) they had purchasedmeat. Retailers were independently of this linked to meat processingplants, and thus participants were linked to meat processingplants. This way persons could be linked to (amongst other) plant no 7.

Source

Tine Hald.

References

Molbak K and Hald T: Salmonella Typhimurium outbreak in late summer1996. A Case-control study. (In Danish:Salmonella typhimurium udbrud paa Fyn sensommeren1996. En case-kontrol undersogelse.) Ugeskrift for Laeger.,159(36):5372-7, 1997.

Examples

data(S.typh)

A wrapper for`termplot` that optionally (but by default)exponentiates terms, and plot them on a common log-scale. Also scalesx-axes to the same physical scale.

Description

The function usestermplot to extract terms from a modelwith, say, spline, terms, including the standard errors, computesconfidence intervals and transform these to the rate / rate-ratioscale. Thus the default use is for models on the log-scale such asPoisson-regression models. The function produces a plot with panelsside-by-side, one panel per term, and returns the

Usage

  Termplot( obj,           plot = TRUE,           xlab = NULL,           ylab = NULL,            xeq = TRUE,           yshr = 1,          alpha = 0.05,          terms = NULL,         max.pt = NULL )

Arguments

obj

An object with aterms-method — for details thethe documentation fortermplot.

plot

Should a plot be produced?

xlab

Labels for thex-axes. Defaults to the names of the terms.

ylab

Labels for thex-axes. Defaults to blank.

xeq

Should the units all all plots have the same physical scalefor thex-axes).

yshr

Shrinking ofy-axis. By default, they-axeshave an extent that accommodates the entire range of confidenceintervals. This is a shrinking parameter for they-axes,setting it to less than 1 will lose a bit of the confidence limitson some of the panels.

alpha

1 minus the confidence level for computing confidenceintervals

terms

Which terms should be reported. Passed on totermplot and eventuallypredict.

max.pt

The maximal number of points in which to report theterms. IfNULL all unique points from the analysis datasetare reported for each term (this is a feature oftermplot).

Value

A list with one component per term in the model objectobj,each component is a 4-column matrix with $x$ as the first column, and3 columns with estimae and lower and upper confidence limit.

Author(s)

Bendix Cartensen

Examples

# Get the diabetes data and set up as Lexis objectdata(DMlate)DMlate <- DMlate[sample(1:nrow(DMlate),500),]dml <- Lexis( entry = list(Per=dodm, Age=dodm-dobth, DMdur=0 ),               exit = list(Per=dox),        exit.status = factor(!is.na(dodth),labels=c("DM","Dead")),               data = DMlate )# Split in 1-year age intervalsdms <- splitLexis( dml, time.scale="Age", breaks=0:100 )# Model with 6 knots for both age and periodn.kn <- 6# Model age-specific rates with period referenced to 2004( a.kn <- with( subset(dms,lex.Xst=="Dead"),                quantile( Age+lex.dur, probs=(1:n.kn-0.5)/n.kn ) ) )( p.kn <- with( subset(dms,lex.Xst=="Dead"),                quantile( Per+lex.dur, probs=(1:n.kn-0.5)/n.kn ) ) )m2 <- glm( lex.Xst=="Dead" ~ -1 +                             Ns( Age, kn=a.kn, intercept=TRUE ) +                             Ns( Per, kn=p.kn, ref=2004 ),           offset = log( lex.dur ), family=poisson, data=dms )# Finally we can plot the two effects:Termplot( m2, yshr=0.9 )

Population risk time in Denmark

Description

Risk time (person-years) in the Danish population, classified by sex,age, period and date of birth in 1-year classes. This corresponds totriangles in a Lexis diagram.

Usage

data(Y.dk)

Format

A data frame with 13860 observations on the following 6 variables.

sex: Sex. 1:males, 2:females
A: One-year age class
P: Period
C: Birth cohort
Y: Person-years
upper: Indicator of upper triangle in the Lexis diagram

Details

The risk time is computed from the population size figures inN.dk, using the formulae devised in:B. Carstensen: Age-period-cohort models for the Lexis diagram.Statistics in Medicine, 10; 26(15):3018-45, 2007.

Source

http://www.statistikbanken.dk/statbank5a/SelectTable/omrade0.asp?SubjectCode=02&PLanguage=1&ShowNews=OFF

Examples

data(Y.dk)str(Y.dk)# Compute mean age, period for the trianglesattach( Y.dk )age <- A + (1+upper)/3per <- P + (2-upper)/3# Plot a Lexis diagramlibrary( Epi )Lexis.diagram( age=c(0,10), date=c(1990,2000), coh.grid=TRUE, int=1 )box()# Print the person-years for males theretext( per[sex==1], age[sex==1],      formatC( Y[sex==1]/1000, format="f", digits=1 ) )

Add covariates (typically clinical measurements) taken at known timesto a Lexis object.

Description

When follow-up in a multistate model is represented in aLexis object we may want to add information oncovariates, for example clinical measurements, obtained at differenttimes. This function cuts the follow-up time (seecutLexis) at the times of measurement and carries themeasurements forward in time to the next measurement occasion.

Usage

## S3 method for class 'Lexis'addCov(Lx,                     clin,                timescale = 1,                    exnam,                      tfc = "tfc", ...)

Arguments

Lx

A Lexis object with follow-up of a cohort.

clin

A data frame with covariates to add (typically clinicalmeasurements). Must contain a variablelex.id identifying thepersons represented inLx, as well as a variable with thesame name as one of thetimeScales inLx,identifying the time at which covariates are measured.

The times must be unique within each person; if not records withduplicate times are discarded, and a warning issued. This is doneusingduplicated, so not very well-defined, it is advisablethat you do this yourself.

timescale

Numerical or character. Number or name of a timescale inLx. Theclin data frame must have a variable of thisname indicating the time at which the covariate measurements weretaken.

exnam

Character. Name of the variable inclin with the examinationnames (such aswave1,wave2 etc.). Values may not berepeated within person and cannot be equal to any oflevels(Lx). Will be carried over to the resultingLexis object. If there is no such variable inclin itwill be constructed; if the argument is omitted, a variable calledexnam with valuesex1,ex2 etc. will beconstructed.

tfc

Character (timefrom lastclinical visit). Nameof the variable in the result which will contain the time since themost recent covariate date. It will be included among the timescalesof the resulting Lexis object.

If the argument is omitted a variable calledtfc will beconstructed.

...

Arguments passed on. Ignored.

Value

ALexis object representing the same follow-up asLx, with cuts added at the times of examination, and covariatemeasurements added for all records representing follow-up after themost recent time of measurement.

Alsotfc is added as a time scale, it is however not a propertimescale since it is reset at every clinical examination. Thereforthe value of thetimeSince attribute is set to "X" in orderto distinguish it from other proper time scales that either have anempty string or the name of a state.

Author(s)

Bendix Carstensen,b@bxc.dk,http://bendixcarstensen.com

Examples

# A small bogus cohortxcoh <- structure( list( id = c("A", "B", "C"),                      birth = c("1952-07-14", "1954-04-01", "1987-06-10"),                      entry = c("1965-08-04", "1972-09-08", "1991-12-23"),                       exit = c("1997-06-27", "1995-05-23", "1998-07-24"),                       fail = c(1, 0, 1) ),                     .Names = c("id", "birth", "entry", "exit", "fail"),                  row.names = c("1", "2", "3"),                      class = "data.frame" )# Convert the character dates into numerical variables (fractional years)xcoh$bt <- cal.yr( xcoh$birth )xcoh$en <- cal.yr( xcoh$entry )xcoh$ex <- cal.yr( xcoh$exit  )# Define as Lexis object with timescales calendar time and ageLcoh <- Lexis( entry = list( per=en ),                exit = list( per=ex, age=ex-bt ),         exit.status = factor( fail, 0:1, c("Alive","Dead") ),                data = xcoh )str( Lcoh )Lx <- Lcoh[,1:7]# Data frame with clinical examination data, date of examination in perclin <- data.frame(lex.id = c(1,1,3,2),                      per = cal.yr(c("1977-4-7",                                     "1971-7-1",                                     "1996-2-15",                                     "1990-7-3")),                       bp = c(120,140,160,157),                     chol = c(5,7,8,9),                     xnam = c("X2","X1","X1","X2") )Lxclinstr(Lx)str(clin)# Different behavours when using exnam explicitlyaddCov.Lexis( Lx, clin[,-5] )addCov.Lexis( Lx, clin, exnam="xnam" )# Works with time split BEFORELb <- addCov.Lexis(splitLexis(Lx,                              time.scale="age",                              breaks=seq(0,80,5) ),                   clin,                   exnam="clX" )Lb# and also AFTERLa <- splitLexis(addCov.Lexis( Lx,                             clin,                            exnam = "xnam" ),                 breaks=seq(0,80,5),                 time.scale="age" )LaLa$tfc == Lb$tfcLa$age == Lb$agestr(La)str(Lb)

Expand a Lexis object with information of drug exposure based onpurchase dates and -amounts

Description

ALexis object will contain information on follow-up for acohort of persons through time, each record containing information ofone time interval, including the time at the beginning of eachinterval. If information on drug purchase is known for the persons vialex.id in a list of data frames,addDrug.Lexis will expandtheLexis object by cutting at all drug purchase dates, andcompute the exposure status for any number of drugs, and add these asvariables.

In some circumstances the result is a Lexis object with a very largenumber of very small follow-up intervals. The functioncoarse.Lexis combines consecutive follow-up intervals using thecovariates from the first of the intervals.

Usage

## S3 method for class 'Lexis'addDrug(Lx,  # Lexis object            pdat,  # list of data frames with drug purchase information             amt = "amt", # name of the variable with purchased amount             dpt = "dpt", # name of the variable with amount consumed per time             apt = NULL,  # old name for dpt          method = "ext", # method use to compute exposure             maxt = NULL,  # max duration for a purchase when using "fix"           grace = 0,     # grace  period to be added            tnam = setdiff(names(pdat[[1]]), c("lex.id", amt))[1],                          # name of the time variable from Lx          prefix = TRUE,  # should drug names prefix variable names           sepfix = ".",   # what should the separator be when forming prefix/suffix         verbose = TRUE,           ...)coarse.Lexis(Lx, lim, keep = FALSE)

Arguments

Lx

ALexis object.

pdat

Named list of data frames with drugpurchasedata.

amt

Name of the variable in the data frames inpdat withthe purchasedamount.

dpt

Name of the variable in the data frames inpdat withthe consumeddosepertime. Must be given inunits of units ofamt per units oflex.dur inLx.

apt

Name previously used fordpt. Will disappear in nextversion.

method

Character. One of"ext" (default),"dos"or"fix", for a description, see details.

maxt

Numerical. Maximal duration for a purchase when usingmethod="fix", same units aslex.dur.

grace

Numeric. Grace period to be added after last time ofcomputed drug coverage to define end of exposure, same units aslex.dur.

tnam

Character. Name of the timescale used in the data framesinpdat.

prefix

Logical. Should the names ofpdat be used asprefix for the 4 generated exposure variables for each drug. If falsethe names ofpdat will be used as suffix.

sepfix

Character, used to separate theprefix and thename of the generated type of variable.

verbose

Logical. Should the function tell you about the choicesyou made?

lim

Numeric vector of length 2. Consecutive follow-up intervalsare combined if the first haslex.dur <lim[1], and thesum oflex.dur in the two intervals is smaller thanlim[2]. If a scalar i given,c(lim,3*lim) is used.

keep

Logical of length 1 ornrow(Lx) that points torecords that cannot be combined with preceding records.

...

Arguments passed on. Ignored.

Details

This function internally usesaddCov.Lexis to attachexposure status for several drugs (dispensed medicine) to follow-up in aLexis object. Once that is done, the exposure measures arecalculated at each time.

There is one input data frame per type of drug, each with variableslex.id,amt, a timescale variable and possibly a variabledpt.

Three different methods for computing drug exposures from dates andamounts of purchases are supported via the argumentmethod.

"ext": Extrapolation: the first drug purchase is assumedconsumed over the interval to the second purchase. Exposurefor subsequent purchases are assumed to last as long as it would have ifconsumed at a speed corresponding to the previous purchase beingconsumed over the time span between the previous and current purchase,plus a period of lengthgrace.
"dos": Dosage: assumes that each purchase lastsamt/dpt plusgrace.
"fix": Fixed time: assumes that each purchase lastsmaxt.

So for each purchase we have defined an end of coverage (expirydate). If next purchase is before this, we assume that the amountpurchased is consumed over the period between the two purchases,otherwise over the period to the end of coverage. So the only differencebetween the methods is the determination of the coverage for eachpurchase.

Based on this, for each date in the resultingLexis fourexposure variables are computed, see next section.

Value

ALexis object with the same risk time, states and eventsasLx. The follow-up for each person has been cut at the purchasetimes of each of the drugs, as well as at the expiry times for each drugcoverage. Further, for each drug (i.e. the data frame in thepdatlist) the name of thepdat component determines the prefix forthe 4 variables that will be added. Supposing this isAA for agiven drug, then 4 new variables will be:

AA.ex: logical; is the person exposed in this interval
AA.tf: numeric: time since first purchase, same units aslex.dur
AA.ct: numeric: cumulative time on the drug, same unitsaslex.dur
AA.cd: numeric: cumulative dose of the drug, same unitsasamt

So ifpdat is a list of length 3 with namesc("a","b","c")the function will add variablesa.ex, a.tf, a.ct, a.cd, b.ex, b.tf, b.ct, b.cd, c.ex, c.tf, c.ct, c.cd

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

# Follow-up of 2 personsclear()fu <- data.frame(doe = c(2006, 2008),                 dox = c(2015, 2018),                 dob = c(1950, 1951),                 xst = factor(c("A","D")))Lx <- Lexis(entry = list(per = doe,                         age = doe- dob),             exit = list(per = dox),      exit.status = xst,             data = fu)Lx <- subset(Lx, select = -c(doe, dob, dox, xst))# split FU in 1 year intervalsSx <- splitLexis(Lx, "per", breaks = seq(1990, 2020, 1.0))# drug purchases, one data frame for each drug ra <- data.frame(per = c(2007 + runif(12,0,10)),                 amt = sample(2:4, 12, r = TRUE),              lex.id = sample(1:2, 12, r = TRUE))ra <- ra[order(ra$lex.id, ra$per),]rb <- data.frame(per = c(2009 + runif(10, 0, 10)),                 amt = sample(round(2:4/3,1), 10, r = TRUE),              lex.id = sample(1:2, 10, r = TRUE))rb <- rb[order(rb$lex.id, rb$per),]# put in a named listpdat <- list(A = ra, B = rb)pdatex1 <- addDrug.Lexis(Sx, pdat, method = "ext") # defaultsummary(ex1)# collapsing some of the smaller intervals with the nextsummary(coarse.Lexis(ex1, c(0.2,0.5)))ex2 <- addDrug.Lexis(Sx, pdat, method = "ext", grace = 0.2)dos <- addDrug.Lexis(Sx, pdat, method = "dos", dpt = 6)fix <- addDrug.Lexis(Sx, pdat, method = "fix", maxt = 1)

Fit Age-Period-Cohort models and Lee-Carter models with effectsmodeled by natural splines.

Description

apc.LCa fits an Age-Period-Cohort model and sub-models (usingapc.fit) as well as Lee-Carter models (usingLCa.fit).show.apc.LCa plots the models in littleboxes with their residual deviance with arrows showing theirrelationships.

Usage

apc.LCa( data,  keep.models = FALSE,          ... )show.apc.LCa( x,       dev.scale = TRUE,             top = "Ad", ... )

Arguments

data

A data frame that must have columnsA,P,D andY, see e.g.apc.fit

keep.models

Logical. Should theapc object and the 5LCa objects be returned too?

...

Further parameters passed on toLCa.fit orboxes.matrix.

x

The result from a call toapc.LCa.

dev.scale

Should the vertical position of the boxes with themodels be scales relative to the deviance between the Age-driftmodel and the extended Lee-Carter model?

top

The model presented at the top of the plot of boxes(together with any other model with larger deviance) whenvertical position is scaled by deviances. Only "Ad", "AP", "AC","APa" or "ACa" will make sense.

Details

The functionapc.LCa fits all 9 models (well, 10) available asextension and sub-models of the APC-model and compares them byreturning deviance and residual df.

Value

A 9 by 2 matrix classified by model and deviance/df; optionally(ifmodels=TRUE) a list with the matrix asdev,apc, anapc object (fromapc.fit), andLCa, a listwith 5LCa objects (fromLCa.fit).

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

library( Epi )clear()# Danish lung cancer incidence in 5x5x5 Lexis trianglesdata( lungDK )lc <- subset( lungDK, Ax>40 )[,c("Ax","Px","D","Y")]names( lc )[1:2] <- c("A","P")head( lc )al <- apc.LCa( lc, npar=c(9,6,6,6,10), keep.models=TRUE, maxit=500, eps=10e-3 )show.apc.LCa( al, dev=TRUE )# Danish mortality data## Not run: data( M.dk )mdk <- subset( M.dk, sex==1 )[,c("A","P","D","Y")]head( mdk )al <- apc.LCa( mdk, npar=c(15,15,20,6,6), maxit=50, eps=10e-3,               quiet=FALSE, VC=FALSE )show.apc.LCa( al, dev=FALSE )show.apc.LCa( al, dev=TRUE )show.apc.LCa( al, top="AP" )# Fit a reasonable model to Danish mortality data and plot resultsmAPa <- LCa.fit( mdk, model="APa", npar=c(15,15,20,6,6), c.ref=1930,                 a.ref=70, quiet=FALSE, maxit=250 )par( mfrow=c(1,3) )plot( mAPa ) ## End(Not run)

Fit an Age-Period-Cohort model to tabular data.

Description

Fits the classical five models to tabulated rate data (cases,person-years) classified by two of age, period, cohort:Age, Age-drift, Age-Period, Age-Cohort and Age-Period-Cohort. There are noassumptions about the age, period or cohort classes being of the samelength, or that tabulation should be only by two of the variables.Only requires that mean age and period for each tabulation unit is given.

Usage

apc.fit( data,            A,            P,            D,            Y,        ref.c,        ref.p,          dist = c("poisson","binomial"),         model = c("ns","bs","ls","factor"),       dr.extr = "Y",          parm = c("ACP","APC","AdCP","AdPC","Ad-P-C","Ad-C-P","AC-P","AP-C"),          npar = c( A=5, P=5, C=5 ),         scale = 1,         alpha = 0.05,     print.AOV = TRUE )

Arguments

data

Data frame with (at least) variables,A (age),P (period),D (cases, deaths) andY(person-years). Cohort (date of birth) is computed asP-A.If this argument is given the argumentsA,P,D andY are ignored.

A

Age; numerical vector with mean age at diagnosis for each unit.

P

Period; numerical vector with mean date of diagnosis for eachunit.

D

Cases, deaths; numerical vector.

Y

Person-years; numerical vector. Also used as denominator for binomialdata, see thedist argument.

ref.c

Reference cohort, numerical. Defaults to median date ofbirth among cases. If used withparm="AdCP" orparm="AdPC",the residual cohort effects will be 1 atref.c

ref.p

Reference period, numerical. Defaults to median date ofdiagnosis among cases.

dist

Distribution (or more precisely: Likelihood) used for modeling.if a binomial model us used,Y is assumed to be thedenominator;"binomial" gives a binomial model with logitlink. The Age-effects returned are converted to theprobability scale, Period and Cohort effects are still odds-ratios.

model

Type of model (covariate effects) fitted:

ns fits a model with natural splines for each ofthe terms, withnpar parameters for the terms.
bs fits a model with B-splines for each ofthe terms, withnpar parameters for the terms.
ls fits a model with linear splines.
factor fits a factor model with one parameterper value ofA,P andP-A.nparis ignored in this case.

dr.extr

Character or numeric.How the drift parameter should be extracted from theage-period-cohort model. Specifies the inner product used fordefinition of orthogonality of the period / cohort effects to thelinear effects — in terms of a diagonal matrix.

"Y" (default) uses the no. person-time,Y,corresponding to the observed information about the square root ofthe rate.

"R" or"L" usesY*Y/D corresponding to theobserved information about the rate (usually termed "lambda", hencethe "L").

"D" or"T" uses the no. events as the weight in theinner product, corresponding to the information about the log-rate(usually termed "theta", hence the "T").

If given"n" (naive) (well, in fact any other character value) willinduce the use of the standard inner product putting equal weight onall units in the dataset.

Ifdr.extr is a numeric vector this is used as the diagonalof the matrix inducing the inner product.

Ifdr.extr is a numeric scalar,D + dr.extr*Y is usedas the diagonal of the matrix inducing the inner product. Thisfamily of inner products are the only ones that meet thesplit-observation invariance criterion.

The setting of this parameter has no effect on the fit of the model,it only influences the parametrization returned in theAge,Per andCoh elements of the resulting list.

parm

Character. Indicates the parametrization of the effects.The first four refer to the ML-fit of the Age-Period-Cohort model,the last four give Age-effects from a smaller model and residualsrelative to this. If one of the latter is chosen, the argumentdr.extr is ignored. Possible values forparm are:

"ACP": ML-estimates. Age-effects as rates for thereference cohort. Cohort effects as RR relative to the referencecohort. Period effects constrained to be 0 on average with 0 slope.
"APC": ML-estimates. Age-effects as rates for thereference period. Period effects as RR relative to the referenceperiod. Cohort effects constrained to be 0 on average with 0 slope.
"AdCP": ML-estimates. Age-effects as rates for thereference cohort. Cohort and period effects constrained to be 0 onaverage with 0 slope. In this case returned effects do notmultiply to the fitted rates, the drift is missing and needs to beincluded to produce the fitted values.
"AdPC": ML-estimates. Age-effects as rates for thereference period. Cohort and period effects constrained to be 0 onaverage with 0 slope. In this case returned effects do notmultiply to the fitted rates, the drift is missing and needs to beincluded to produce the fitted values.
"Ad-C-P": Age effects are rates for the referencecohort in the Age-drift model (cohort drift). Cohort effects are from the modelwith cohort alone, using log(fitted values) from the Age-driftmodel as offset. Period effects are from the model with periodalone using log(fitted values) from the cohort model as offset.
"Ad-P-C": Age effects are rates for the referenceperiod in the Age-drift model (period drift). Period effects are from the modelwith period alone, using log(fitted values) from the Age-driftmodel as offset. Cohort effects are from the model with cohortalone using log(fitted values) from the period model as offset.
"AC-P": Age effects are rates for the referencecohort in the Age-Cohort model, cohort effects are RR relative tothe reference cohort. Period effects are from the modelwith period alone, using log(fitted values) from the Age-Cohortmodel as offset.
"AP-C": Age effects are rates for the referenceperiod in the Age-Period model, period effects are RR relative tothe reference period. Cohort effects are from the modelwith cohort alone, using log(fitted values) from the Age-Periodmodel as offset.

npar

The number of parameters/knots to use for each of the terms inthe model. If it is vector of length 3, the numbers are taken as theno. of knots for Age, Period and Cohort, respectively. Unless it hasa names attribute with values "A", "P" and "C" in which case thesewill be used. The knots chosen are the quantiles(1:nk-0.5)/nk of the events (i.e. ofrep(A,D) andsimilarly forP andC).

npar may also be a named list of three numerical vectors withnames "A", "P" and "C", in which case these taken as the knots forthe age, period and cohort effect, the smallest and largest element ineach vector are used as the boundary knots.

alpha

The significance level. Estimates are given with(1-alpha) confidence limits.

scale

numeric(1), factor multiplied to the rate estimates before output.

print.AOV

Should the analysis of deviance table for the modelsbe printed?

Details

Each record in the input data frame represents a subset of a Lexisdiagram. The subsets need not be of equal length on the age andperiod axes, in fact there are no restrictions on the shape ofthese; they could be Lexis triangles for example. The requirement isthatA andP are coded with the mean age and calendartime of observation in the subset. This is essential sinceAandP are used as quantitative variables in the models.

This approach is different from to the vast majority of the uses ofAPC-models in the literature where a factor model is used for age,period and cohort effects. The latter can be obtained by usingmodel="factor". Note however that the cohort factor is definedfromA andP, so that it is not possible in thisframework to replicate the Boyle-Robertson fallacy.

Value

An object of class "apc" (recognized byapc.plot andapc.lines) — a list with components:

Type

Text describing the model and parametrization returned.

Model

The model object(s) on which the parametrization is based.

Age

Matrix with 4 columns:A.pt with the ages (equalsunique(A)) and three columns giving the estimated rates withc.i.s.

Per

Matrix with 4 columns:P.pt with the dates ofdiagnosis (equalsunique(P)) and three columns giving theestimated RRs with c.i.s.

Coh

Matrix with 4 columns:C.pt with the dates of birth(equalsunique(P-A)) and three columns giving the estimatedRRs with c.i.s.

Drift

A 3 column matrix with drift-estimates and c.i.s: Thefirst row is the ML-estimate of the drift (as defined bydrift), the second row is the estimate from the Age-driftmodel. The first row name indicates which type of inner product wereused for projections. For the sequential parametrizations, only thelatter is given.

Ref

Numerical vector of length 2 with reference period and cohort.If ref.p or ref.c was not supplied the corresponding element is NA.

Anova

Analysis of deviance table comparing the five classicalmodels.

Knots

Ifmodel is one of"ns" or"bs", a listwith three components:Age,Per,Coh, each one avector of knots. The max and the min of the vectors are the boundary knots.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

References

The considerations behind the parametrizations used in this functionare given in detail in:B. Carstensen: Age-Period-Cohort models for the Lexis diagram.Statistics in Medicine, 10; 26(15):3018-45, 2007.

Various links to course material etc. is available throughhttp://bendixcarstensen.com/APC/

Examples

library( Epi )data(lungDK)# Taylor a dataframe that meets the requirements for variable namesexd <- lungDK[,c("Ax","Px","D","Y")]names(exd)[1:2] <- c("A","P")# Three different ways of parametrizing the APC-model, MLex.1 <- apc.fit( exd, npar=7, model="ns", dr.extr="1", parm="ACP", scale=10^5 )ex.D <- apc.fit( exd, npar=7, model="ns", dr.extr="D", parm="ACP", scale=10^5 )ex.Y <- apc.fit( exd, npar=7, model="ns", dr.extr="Y", parm="ACP", scale=10^5 )# Sequential fit, first AC, then P given AC.ex.S <- apc.fit( exd, npar=7, model="ns", parm="AC-P", scale=10^5 )# Show the estimated driftsex.1[["Drift"]]ex.D[["Drift"]]ex.Y[["Drift"]]ex.S[["Drift"]]# Plot the effectslt <- c("solid","22")[c(1,1,2)]apc.plot( ex.1, lty=c(1,1,3) )apc.lines( ex.D, col="red", lty=c(1,1,3) )apc.lines( ex.Y, col="limegreen", lty=c(1,1,3) )apc.lines( ex.S, col="blue", lty=c(1,1,3) )

Produce an empty frame for display of parameter-estimates fromAge-Period-Cohort-models.

Description

A plot is generated where both the age-scale and the cohort/periodscale is on the x-axis. The left vertical axis will be a logarithmicrate scale referring to age-effects and the right a logarithmicrate-ratio scale of the same relative extent as the left referring tothe cohort and period effects (rate ratios).

Only an empty plot frame is generated. Curves or points must be addedwithpoints,lines or the special utility functionapc.lines.

Usage

  apc.frame( a.lab,            cp.lab,             r.lab,            rr.lab = r.lab / rr.ref,            rr.ref = r.lab[length(r.lab)/2],             a.tic = a.lab,            cp.tic = cp.lab,             r.tic = r.lab,            rr.tic = r.tic / rr.ref,           tic.fac = 1.3,             a.txt = "Age",            cp.txt = "Calendar time",             r.txt = "Rate per 100,000 person-years",            rr.txt = "Rate ratio",          ref.line = TRUE,               gap = diff(range(c(a.lab, a.tic)))/10,          col.grid = gray(0.85),             sides = c(1,2,4) )

Arguments

a.lab

Numerical vector of labels for the age-axis.

cp.lab

Numerical vector of labels for the cohort-period axis.

r.lab

Numerical vector of labels for the rate-axis (left vertical)

rr.lab

Numerical vector of labels for the RR-axis (right vertical)

rr.ref

At what level of the rate scale is the RR=1 to be.

a.tic

Location of additional tick marks on the age-scale

cp.tic

Location of additional tick marks on the cohort-period-scale

r.tic

Location of additional tick marks on the rate-scale

rr.tic

Location of additional tick marks on the RR-axis.

tic.fac

Factor with which to diminish intermediate tick marks

a.txt

Text for the age-axis (left part of horizontal axis).

cp.txt

Text for the cohort/period axis (right part ofhorizontal axis).

r.txt

Text for the rate axis (left vertical axis).

rr.txt

Text for the rate-ratio axis (right vertical axis)

ref.line

Logical. Should a reference line at RR=1 be drawn at thecalendar time part of the plot?

gap

Gap between the age-scale and the cohort-period scale

col.grid

Colour of the grid put in the plot.

sides

Numerical vector indicating on which sides axes shouldbe drawn and annotated. This option is aimed for multi-paneldisplays where axes only are put on the outer plots.

Details

The function produces an empty plot frame for display of resultsfrom an age-period-cohort model, with age-specific rates in the leftside of the frame and cohort and period rate-ratio parameters in theright side of the frame. There is a gap ofgap between theage-axis and the calendar time axis, vertical grid lines atc(a.lab,a.tic,cp.lab,cp.tic), and horizontal grid lines atc(r.lab,r.tic).

The function returns a numerical vector oflength 2, with namesc("cp.offset","RR.fac"). The y-axis forthe plot will be a rate scale for the age-effects, and the x-axis willbe the age-scale. The cohort and period effects are plotted bysubtracting the first element (named"cp.offset") of the returned resultform the cohort/period, and multiplying the rate-ratios by the secondelement of the returned result (named"RR.fac").

Value

A numerical vector of length two, with namesc("cp.offset","RR.fac"). The first is the offset for the cohortperiod-axis, the second the multiplication factor for the rate-ratioscale.

Side-effect: A plot with axes and grid lines but no points or curves.Moreover, the optionapc.frame.par is given the valuec("cp.offset","RR.fac"), which is recognized byapc.plotandapc.lines.

Author(s)

Bendix Carstensen, Steno Diabetes Center,http://bendixcarstensen.com

References

B. Carstensen: Age-Period-Cohort models for the Lexisdiagram. Statistics in Medicine, 26: 3018-3045, 2007.

Examples

par( mar=c(4,4,1,4) )res <-apc.frame( a.lab=seq(30,90,20), cp.lab=seq(1880,2000,30), r.lab=c(1,2,5,10,20,50),           a.tic=seq(30,90,10), cp.tic=seq(1880,2000,10), r.tic=c(1:10,1:5*10),           gap=27 )res# What are the axes actually?par(c("usr","xlog","ylog"))# How to plot in the age-part: a point at (50,10)points( 50, 10, pch=16, cex=2, col="blue" )# How to plot in the cohort-period-part: a point at (1960,0.3)points( 1960-res[1], 0.3*res[2], pch=16, cex=2, col="red" )# or referring to the period-cohort part of the plot pc.points( 1960, 0.3, pch=16, cex=1, col="green" )

Plot APC-estimates in an APC-frame.

Description

When an APC-frame has been produced byapc.frame, thisfunction draws a set of estimates from an APC-fit in the frame. Anoptional drift parameter can be added to the period parameters andsubtracted from the cohort and age parameters.

Usage

## S3 method for class 'apc'lines( x, P, C,        scale = c("log","ln","rates","inc","RR"),    frame.par = options()[["apc.frame.par"]],        drift = 0,           c0 = median( C[,1] ),           a0 = median( A[,1] ),           p0 = c0 + a0,           ci = rep( FALSE, 3 ),          lwd = c(3,1,1),          lty = 1,          col = "black",         type = "l",        knots = FALSE,        shade = FALSE,          ... ) apc.lines( x, P, C,        scale = c("log","ln","rates","inc","RR"),    frame.par = options()[["apc.frame.par"]],        drift = 0,           c0 = median( C[,1] ),           a0 = median( A[,1] ),           p0 = c0 + a0,           ci = rep( FALSE, 3 ),          lwd = c(3,1,1),          lty = 1,          col = "black",         type = "l",        knots = FALSE,        shade = FALSE,          ... )

Arguments

x

If anapc-object, (seeapc.fit), thenthe argumentsP,C,c0,a0 andp0are ignored, and the estimates fromx plotted.

Can also be a 4-column matrix with columns age, age-specificrates, lower and upper c.i., in which case period and cohort effectsare taken from the argumentsP andC.

P

Period effects. Rate-ratios. Same form as for the age-effects.

C

Cohort effects. Rate-ratios. Same form as for the age-effects.

scale

Are effects given on a log-scale? Character variable, oneof"log","ln","rates","inc","RR". If"log" or"ln" it is assumed thateffects are log(rates) and log(RRs) otherwise the actual effects areassumed given inA,P andC. IfA is ofclassapc, it is assumed to be"rates".

frame.par

2-element vector with the cohort-period offset andRR multiplicator. This will typically be the result from the call ofapc.frame. See this for details.

drift

The drift parameter to be added to the period effect. Ifscale="log" this is assumed to be on the log-scale, otherwiseit is assumed to be a multiplicative factor per unit of the firstcolumns ofA,P andC

c0

The cohort where the drift is assumed to be 0; the subtracteddrift effect isdrift*(C[,1]-c0).

a0

The age where the drift is assumed to be 0.

p0

The period where the drift is assumed to be 0.

ci

Should confidence interval be drawn. Logical orcharacter. If character, any occurrence of"a" or"A"produces confidence intervals for the age-effect. Similarly forperiod and cohort.

lwd

Line widths for estimates, lower and upper confidence limits.

lty

Linetypes for the three effects.

col

Colours for the three effects.

type

What type of lines / points should be used.

knots

Should knots from the model be shown?

shade

Should confidence intervals be plotted as shaded areas?If true, the setting ofci is ignored.

...

Further parameters to be transmitted topointslines,matpoints ormatlines usedfor plotting the three sets of curves.

Details

There is no difference between the functionsapc.lines andlines.apc, except the the latter is thelines methodforapc objects.

The drawing of three effects in an APC-frame is a rather trivial task,and the main purpose of the utility is to provide a function thateasily adds the functionality of adding a drift so that several setsof lines can be easily produced in the same frame.

Value

apc.lines returns (invisibly) a list of three matrices of theeffects plotted.

Author(s)

Bendix Carstensen, Steno Diabetes Center,http://bendixcarstensen.com

A case-control study of endometrial cancer

Description

Thebdendo data frame has 315 rows and 13 columns,bdendo11 126 rows. These data concern a study in which each caseof endometrial cancer was matched with 4 controls.bdendo11 is a1:1 mathed subset ofbdendo. Matching was by date of birth(within one year), marital status, and residence.

Format

These data frames have the following columns:

`set`:	Case-control set: a numeric vector
`d`:	Case or control: a numeric vector (1=case, 0=control)
`gall`:	Gall bladder disease: a factor with levels`NoYes`.
`hyp`:	Hypertension: a factor with levels`NoYes`.
`ob`:	Obesity: a factor with levels`NoYes`.
`est`:	A factor with levels`NoYes`.
`dur`:	Duration of conjugated oestrogen therapy: a factor with levels`0`,`1`,`2`,`3`,`4`.
`non`:	Use of non oestrogen drugs: a factor with levels`NoYes`.
`duration`:	Months of oestrogen therapy: a numeric vector.
`age`:	A numeric vector.
`cest`:	Conjugated oestrogen dose: a factor with levels`0`,`1`,`2`,`3`.
`agegrp`:	A factor with levels`55-5960-6465-6970-7475-7980-84`
`age3`:	a factor with levels`<6465-7475+`

Source

Breslow NE, and Day N, Statistical Methods in Cancer Research. VolumeI: The Analysis of Case-Control Studies. IARC ScientificPublications, IARC:Lyon, 1980.

Examples

data(bdendo)str(bdendo)

Births in a London Hospital

Description

Data from 500 singleton births in a London Hospital

Usage

data(births)

Format

A data frame with 500 observations on the following 8 variables.

`id`:	Identity number for mother and baby.
`bweight`:	Birth weight of baby.
`lowbw`:	Indicator for birth weight less than 2500 g.
`gestwks`:	Gestation period.
`preterm`:	Indicator for gestation period less than 37 weeks.
`matage`:	Maternal age.
`hyp`:	Indicator for maternal hypertension.
`sex`:	Sex of baby: 1:Male, 2:Female.

Source

Anonymous

References

Michael Hills and Bianca De Stavola (2002). A Short Introduction toStata 8 for Biostatistics, Timberlake Consultants Ltd

Examples

data(births)

Bladder cancer mortality in Italian males

Description

Number of deaths from bladder cancer and person-years in the Italianmale population 1955–1979, in ages 25–79.

Format

A data frame with 55 observations on the following 4 variables:

`age`:	Age at death. Left endpoint of age class
`period`:	Period of death. Left endpoint of period
`D`:	Number of deaths
`Y`:	Number of person-years.

Examples

data(blcaIT)

Create a bootstrap sample of persons (as identified by`lex.id`) from a Lexis object

Description

lex.id is the person identifier in aLexisobject. This is used to sample persons from a Lexis object. If a personis sampled, all records from this persons is transported to thebootstrap sample.

Usage

nid( Lx, ... )## S3 method for class 'Lexis'nid( Lx, by=NULL, ... )bootLexis( Lx, size = NULL, by = NULL, replace=TRUE )

Arguments

Lx

ALexis object.

...

Parameters passed on to other methods.

size

Numeric. How many persons should be sampled from theLexis object. Defaults to the number of persons in theLx, or, ifby is given, to the number of persons ineach level ofby. Ifby is given,size can have lengthlength(unique(by)), to indicate how many are sampled fromeach level ofby.

by

Character. Name of a variable (converted to factor) in theLexis object.

Bootstrap sampling is done within each level of by.

Calculation of the number of persons (lex.id) is done withineach level ofby, and a vector returned.

replace

Should persons be sampled by replacement? Default isTRUE. Settingreplace toFALSE enablesselecting a random subset of persons from the Lexis object.

Value

bootLexis returns a Lexis object of the same structure as theinput, withpersons bootstrapped. The variablelex.idin the resultingLexis object has values 1,2,... The originalvalues oflex.id fromLx are stored in the variableold.id.

nid counts the number of persons in a Lexis object, possibly byby. Ifby is given, a named vector is returned.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com.

Examples

# A small bogus cohortxcoh <- data.frame( id = c("A", "B", "C"),                 birth = c("1952-07-14", "1954-04-01", "1987-06-10"),                 entry = c("1965-08-04", "1972-09-08", "1991-12-23"),                  exit = c("1997-06-27", "1995-05-23", "1998-07-24"),                  fail = c(1, 0, 1),                   sex = c("M","F","M") )# Convert to calendar yearsfor( i in 2:4 ) xcoh[,i] <- cal.yr(xcoh[,i])xcoh <- xcoh[sample(1:3, 10, replace = TRUE),]xcoh$entry <- xcoh$entry + runif(10, 0, 10)xcoh$exit  <- xcoh$entry + runif(10, 0, 10)Lcoh <- Lexis(entry = list(per = entry),               exit = list(per = exit,                           age = exit - birth),        exit.status = fail,               data = xcoh)LcohLx <- splitLexis(Lcoh, breaks = 0:10 * 10, "age")Lxnid(Lx)nid(Lx, by="sex")Lb <- bootLexis(Lx)head(Lb)nid(bootLexis(Lx, size = 7))Li <- bootLexis(Lx, by = "id") # superfluoussummary(Lx)summary(Li)L2 <- bootLexis(Lx, by = "sex", size = c(2, 5))nid(L2, by = "sex")summary(L2, by = "sex")

Draw boxes and arrows for illustration of multistate models.

Description

Boxes can be drawn with text (tbox) or a cross (dbox),and arrows pointing between the boxes (boxarr) can be drawnautomatically not overlapping the boxes. Theboxes method forLexis objects generates displays of states withperson-years and transitions with events or rates.

Usage

   tbox( txt, x, y, wd, ht,         font=2, lwd=2,         col.txt=par("fg"),         col.border=par("fg"),         col.bg="transparent" )   dbox( x, y, wd, ht=wd,         font=2, lwd=2, cwd=5,         col.cross=par("fg"),         col.border=par("fg"),         col.bg="transparent"  )   boxarr( b1, b2, offset=FALSE, pos=0.45, ... )## S3 method for class 'Lexis'boxes( obj,                    boxpos = FALSE,                     wmult = 1.20,                     hmult = 1.20 + 0.85*(!show.Y),                       cex = 1.40,                    show   = inherits( obj, "Lexis" ),                    show.Y = show,                   scale.Y = 1,                  digits.Y = 1,                   show.BE = FALSE,                    BE.sep = c("","","          ",""),                    show.D = show,                   scale.D = FALSE,                  digits.D = as.numeric(as.logical(scale.D)),                    show.R = show & is.numeric(scale.R),                   scale.R = 1,                  digits.R = as.numeric(as.logical(scale.R)),                    DR.sep = if( show.D ) c("\n(",")") else c("",""),                     eq.wd = TRUE,                     eq.ht = TRUE,                        wd,                        ht,                    subset = NULL,                   exclude = NULL,                      font = 1,                       lwd = 2,                   col.txt = par("fg"),                col.border = col.txt,                    col.bg = "transparent",                   col.arr = par("fg"),                   lwd.arr = lwd,                  font.arr = font,                   pos.arr = 0.45,                   txt.arr = NULL,               col.txt.arr = col.arr,                offset.arr = 2,                             ... )## S3 method for class 'matrix'boxes( obj, ... )## S3 method for class 'MS'boxes( obj, sub.st, sub.tr, cex=1.5, ... )   fillarr( x1, y1, x2, y2, gap=2, fr=0.8,            angle=17, lwd=2, length=par("pin")[1]/30, ... )

Arguments

txt

Text to be placed inside the box.

x

x-coordinate of center of box.

y

y-coordinate of center of box.

wd

width of boxes in percentage of the plot width.

ht

height of boxes in percentage of the plot height.

font

Font for the text. Defaults to 2 (=bold).

lwd

Line width of the box borders.

col.txt

Color for the text in boxes.

col.border

Color of the box border.

col.bg

Background color for the interior of the box.

...

Arguments to be passed on to the call of other functions.

cwd

Width of the lines in the cross.

col.cross

Color of the cross.

b1

Coordinates of the "from" box. A vector with 4 components,x,y,w,h.

b2

Coordinates of the "to" box; likeb1.

offset

Logical. Should the arrow be offset a bit to the left.

pos

Numerical between 0 and 1, determines the position of the pointon the arrow which is returned.

obj

ALexis object or a transition matrix; thatis a square matrix indexed by state in both dimensions, and the(i,j)th entry different fromNA if a transitionitoj can occur. Ifshow.D=TRUE, the arrows betweenstates are annotated by these numbers. Ifshow.Y=TRUE, theboxes representing states are annotated by the numbers in thediagonal ofobj.

Forboxes.matrixobj is a matrix and forboxes.MS,obj is anMS.boxes object (see below).

boxpos

IfTRUE the boxes are positioned equidistantly on acircle, ifFALSE (the default) you are queried toclick on the screen for the positions. This argument can alsobe a named list with elementsx andy, both numerical vectors, giving the centers ofthe boxes. These must be numbers between 0 and 100indicating percentages of the display in the two directions.

wmult

Multiplier for the width of the box relative to the width of thetext in the box.

hmult

Multiplier for the height of the box relative to the height of thetext in the box.

cex

Character expansion for text in the box.

show

Should person-years and transitions be put in the plot.Ignored ifobj is not aLexis object.

show.Y

If logical: Should person-years be put in the boxes.If numeric: Numbers to put in boxes.

scale.Y

What scale should be used for annotation of person-years.

digits.Y

How many digits after the decimal point should be used for theperson-years.

show.BE

Logical. Should number of persons beginningresp. ending follow up in each state be shown? If given as character"nz" or "noz" the numbers will be shown, but zeros omitted.

BE.sep

Character vector of length 4, used for annotation of thenumber of persons beginning and ending in each state: 1st elementprecedes no. beginning, 2nd trails it, 3rd precedes the no. ending(defaults to 8 spaces), and the 4th trails the no. ending.

show.D

Should no. transitions be put alongside the arrows.Ignored ifobj is not aLexis object.

scale.D

Synonymous withscale.R, retained for compatibility.

digits.D

Synonymous withdigits.R, retained for compatibility.

show.R

Should the transition rates be shown on the arrows?

scale.R

If this a scalar, rates instead of no. transitions are printedat the arrows, scaled byscale.R.

digits.R

How many digits after the decimal point should be used for therates.

DR.sep

Character vector of length 2. If rates are shown, thefirst element is inserted before and the second after the rate.

eq.wd

Should boxes all have the same width?

eq.ht

Should boxes all have the same height?

subset

Draw only boxes and arrows for a subset of the states.Can be given either as a numerical vector or charactervector state names.

exclude

Exclude states from the plot. The complementary ofsubset.Ignored ifsubset is given.

col.arr

Color of the arrows between boxes.A vector of character strings, the arrows are referred to as therow-wise sequence of non-NA elements of the transition matrix.Thus the first ones refer to the transitions out of state 1, inorder of states.

lwd.arr

Line widths of the arrows.

font.arr

Font of the text annotation the arrows.

pos.arr

Numerical between 0 and 1, determines the position onthe arrows where the text is written.

txt.arr

Text put on the arrows.

col.txt.arr

Colors for text on the arrows.

offset.arr

The amount offset between arrows representingtwo-way transitions, that is where there are arrows both waysbetween two boxes.

sub.st

Subset of the states to be drawn.

sub.tr

Subset of the transitions to be drawn.

x1

x-coordinate of the starting point.

y1

y-coordinate of the starting point.

x2

x-coordinate of the end point.

y2

y-coordinate of the end point.

gap

Length of the gap between the box and the ends of the arrows.

fr

Length of the arrow as the fraction of the distance between theboxes. Ignored unless given explicitly, in which case any valuegiven forgap is ignored.

angle

What angle should the arrow-head have?

length

Length of the arrow head in inches. Defaults to 1/30 of thephysical width of the plot.

Details

These functions are designed to facilitate the drawing of multistatemodels, mainly by automatic calculation of the arrows between boxes.

tbox draws a box with centered text, and returns a vector oflocation, height and width of the box. This is used when drawingarrows between boxes.dbox draws a box with a cross,symbolizing a death state.boxarr draws an arrow between twoboxes, making sure it does not intersect the boxes. Only straightlines are drawn.

boxes.Lexis takes as input a Lexis object sets up an empty plotarea (with axes 0 to 100 in both directions) and ifboxpos=FALSE (the default) prompts you to click on thelocations for the state boxes, and then draws arrows implied by theactual transitions in theLexis object. The default is toannotate the transitions with the number of transitions.

A transition matrix can also be supplied, in which case the row/columnnames are used as state names, diagonal elements taken asperson-years, and off-diagonal elements as number of transitions.This also works forboxes.matrix.

Optionally returns the R-code reproducing the plot in a file, whichcan be useful if you want to produce exactly the same plot withdiffering arrow colors etc.

boxarr draws an arrow between two boxes, on the line connectingthe two box centers. Theoffset argument is used to offset thearrow a bit to the left (as seen in the direction of the arrow) on orderto accommodate arrows both ways between boxes.boxarr returns a namedlist with elementsx,y andd, where the two formergive the location of a point on the arrow used for printing (see argumentpos) and the latter is a unit vector in thedirection of the arrow, which is used byboxes.Lexis toposition the annotation of arrows with the number of transitions.

boxes.MS re-draws whatboxes.Lexis has done based on theobject of classMS produced byboxes.Lexis. The pointbeing that theMS object is easily modifiable, and thus it is amachinery to make variations of the plot with different colorannotations etc.

fill.arr is just a utility drawing nicer arrows than the defaultarrows command, basically by using filled arrow-heads; calledbyboxarr.

Value

The functionstbox anddbox return the location anddimension of the boxes,c(x,y,w,h), which are designed to be usedas input to theboxarr function.

Theboxarr function returns the coordinates (as a namedlist with namesx andy) of a point on thearrow, designated to be used for annotation of the arrow.

The functionboxes.Lexis returns anMS object, a list withfive elements: 1)Boxes - a data frame with one rowper box and columnsxx,yy,wd,ht,font,lwd,col.txt,col.border andcol.bg,2) an objectState.names with names of states (possibly anexpression, hence not possible to include as a column inBoxes),3) a matrixTmat, the transition matrix, 4) a dataframe,Arrows with one row per transition and columns:lwd.arr,col.arr,pos.arr,col.txt.arr,font.arr andoffset.arr and5) an objectArrowtext with names of states (possibly anexpression, hence not possible to include as a column inArrows)

AnMS object is used as input toboxes.MS, the primary use is to modify selected entries intheMS object first, e.g. colors, or supplysub-setting arguments in order to produce displays that have thesame structure, but with different colors etc.

Author(s)

Bendix Carstensen

Examples

par( mar=c(0,0,0,0), cex=1.5 )plot( NA,      bty="n",      xlim=0:1*100, ylim=0:1*100, xaxt="n", yaxt="n", xlab="", ylab="" )bw  <- tbox( "Well"    , 10, 60, 22, 10, col.txt="blue" )bo  <- tbox( "other Ca", 45, 80, 22, 10, col.txt="gray" )bc  <- tbox( "Ca"      , 45, 60, 22, 10, col.txt="red" )bd  <- tbox( "DM"      , 45, 40, 22, 10, col.txt="blue" )bcd <- tbox( "Ca + DM" , 80, 60, 22, 10, col.txt="gray" )bdc <- tbox( "DM + Ca" , 80, 40, 22, 10, col.txt="red" )      boxarr( bw, bo , col=gray(0.7), lwd=3 )# Note the argument adj= can takes values outside (0,1)text( boxarr( bw, bc , col="blue", lwd=3 ),      expression( lambda[Well] ), col="blue", adj=c(1,-0.2), cex=0.8 )      boxarr( bw, bd , col=gray(0.7) , lwd=3 )      boxarr( bc, bcd, col=gray(0.7) , lwd=3 )text( boxarr( bd, bdc, col="blue", lwd=3 ),      expression( lambda[DM] ), col="blue", adj=c(1.1,-0.2), cex=0.8 )# Set up a transition matrix allowing recoverytm <- rbind( c(NA,1,1), c(1,NA,1), c(NA,NA,NA) )rownames(tm) <- colnames(tm) <- c("Cancer","Recurrence","Dead")tmboxes.matrix( tm, boxpos=TRUE )# Illustrate texting of arrowsboxes.Lexis( tm, boxpos=TRUE, txt.arr=c("en","to","tre","fire") )zz <- boxes( tm, boxpos=TRUE, txt.arr=c(expression(lambda[C]),                                        expression(mu[C]),                                        "recovery",                                        expression(mu[R]) ) )# Change color of a boxzz$Boxes[3,c("col.bg","col.border")] <- "green"boxes( zz )# Set up a Lexis objectdata(DMlate)str(DMlate)dml <- Lexis( entry=list(Per=dodm, Age=dodm-dobth, DMdur=0 ),               exit=list(Per=dox),        exit.status=factor(!is.na(dodth),labels=c("DM","Dead")),               data=DMlate[1:1000,] )# Cut follow-up at Insulindmi <- cutLexis( dml, cut=dml$doins, new.state="Ins", pre="DM" )summary( dmi )boxes( dmi, boxpos=TRUE )boxes( dmi, boxpos=TRUE, show.BE=TRUE )boxes( dmi, boxpos=TRUE, show.BE="nz" )boxes( dmi, boxpos=TRUE, show.BE="nz", BE.sep=c("In:","      Out:","") )# Set up a bogus recovery date just to illustrate two-way transitionsdmi$dorec <- dmi$doins + runif(nrow(dmi),0.5,10)dmi$dorec[dmi$dorec>dmi$dox] <- NAdmR <- cutLexis( dmi, cut=dmi$dorec, new.state="DM", pre="Ins" )summary( dmR )boxes( dmR, boxpos=TRUE )boxes( dmR, boxpos=TRUE, show.D=FALSE )boxes( dmR, boxpos=TRUE, show.D=FALSE, show.Y=FALSE )boxes( dmR, boxpos=TRUE, scale.R=1000 )MSobj <- boxes( dmR, boxpos=TRUE, scale.R=1000, show.D=FALSE )MSobj <- boxes( dmR, boxpos=TRUE, scale.R=1000, DR.sep=c(" (",")") )class( MSobj )boxes( MSobj )MSobj$Boxes[1,c("col.txt","col.border")] <- "red"MSobj$Arrows[1:2,"col.arr"] <- "red"boxes( MSobj )

Bereavement in an elderly cohort

Description

Thebrv data frame has 399 rows and 11 columns.The data concern the possible effect of marital bereavement onsubsequent mortality. They arose from a survey of the physical andmental health of a cohort of 75-year-olds in one large generalpractice. These data concern mortality up to 1 January, 1990 (althoughfurther follow-up has now taken place).

Subjects included all lived with a living spouse when they entered thestudy. There are three distinct groups of such subjects: (1) those inwhich both members of the couple were over 75 and therefore included inthe cohort, (2) those whose spouse was below 75 (and was not, therefore,part of the main cohort study), and (3) those living in largerhouseholds (that is, not just with their spouse).

Format

This data frame contains the following columns:

id: subject identifier, a numeric vector
couple: couple identifier, a numeric vector
dob: date of birth, a date
doe: date of entry into follow-up study, a date
dox: date of exit from follow-up study, a date
dosp: date of death of spouse, a date (if the spouse was still aliveat the end of follow-up,this was coded to January 1, 2000)
fail: status at end of follow-up,a numeric vector (0=alive,1=dead)
group: see Description, a numeric vector
disab: disability score, a numeric vector
health: perceived health status score, a numeric vector
sex: a factor with levelsMale andFemale

Source

Jagger C, and Sutton CJ, Death after Marital Bereavement. Statistics inMedicine, 10:395-404, 1991. (Data supplied by Carol Jagger).

Examples

data(brv)

Functions to convert character, factor and various date objects into a number,and vice versa.

Description

Dates are converted to a numerical value, giving the calendar year asa fractional number. 1 January 1970 is converted to 1970.0, and otherdates are converted by assuming that years are all 365.25 days long,so inaccuracies may arise, for example, 1 Jan 2000 is converted to1999.999. Differences between converted values will be 1/365.25 of thedifference between correspondingDate objects.

Usage

  cal.yr( x, format="%Y-%m-%d", wh=NULL )  ## S3 method for class 'cal.yr'as.Date( x, ... )

Arguments

x

A factor or character vector, representing a date in formatformat, or an object of classDate,POSIXlt,POSIXct,date,dates orchron (the latter two requires thechron package).Ifx is a data frame, all variables in the data-framewhich are of one the classes mentioned are converted to classcal.yr.See arguemtwh, though.

format

Format of the date values ifx is factor or character.If this argument is supplied andx is a datafame, allcharacter variables are converted to classcal.yr.Factors in the dataframe will be ignored.

wh

Indices of the variables to convert ifx is a data frame.Can be either a numerical or character vector.

...

Arguments passed on from other methods.

Value

cal.yr returns a numerical vector of the same length asx, of classc("cal.yr","numeric"). Ifx is a data framea dataframe with some of the columns converted to class"cal.yr" isreturned.

as.Date.cal.yr returns aDate object.

Author(s)

Bendix Carstensen, Steno Diabetes Center Copenhagen,b@bxc.dk,http://bendixcarstensen.com

Examples

 # Character vector of dates: birth <- c("14/07/1852","01/04/1954","10/06/1987","16/05/1990",            "12/11/1980","01/01/1997","01/01/1998","01/01/1999") # Proper conversion to class "Date": birth.dat <- as.Date( birth, format="%d/%m/%Y" ) # Converson of character to class "cal.yr" bt.yr <- cal.yr( birth, format="%d/%m/%Y" ) # Back to class "Date": bt.dat <- as.Date( bt.yr ) # Numerical calculation of days since 1.1.1970: days <- Days <- (bt.yr-1970)*365.25 # Blunt assignment of class: class( Days ) <- "Date" # Then data.frame() to get readable output of results: data.frame( birth, birth.dat, bt.yr, bt.dat, days, Days, round(Days) )

Combining a Lexis objects with data frames or other Lexis objects

Description

A Lexis object may be combined side-by-side withdata frames. Or several Lexis objects may stacked, possibly increasingthe number of states and time scales.

Usage

## S3 method for class 'Lexis'cbind(...)## S3 method for class 'Lexis'rbind(...)

Arguments

...

Forcbind a sequence of data frames or vectors ofwhich exactly one has classLexis. Forrbind a sequenceof Lexis objects, supposedly representing follow-up in the samepopulation.

Details

Arguments torbind.Lexis must all beLexisobjects; except for possible NULL objects. The timescales in theresulting object will be the union of all timescales present in allarguments. Values of timescales not present in a contributing Lexisobject will be set toNA. Thebreaks for a given timescale will beNULL if thebreaks of the same time scalefrom two contributing Lexis objects are different.

The arguments tocbind.Lexis must consist of at most one Lexisobject, so the method is intended for amending a Lexis object withextra columns without losing the Lexis-specific attributes.

Value

ALexis object.rbind renders aLexisobject with timescales equal to the union of timescales in thearguments supplied. Values of a given timescale are set toNAfor rows corresponding to supplied objects.cbind basicallyjust adds columns to an existing Lexis object.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

# A small bogus cohortxcoh <- structure( list( id = c("A", "B", "C"),                      birth = c("14/07/1952", "01/04/1954", "10/06/1987"),                      entry = c("04/08/1965", "08/09/1972", "23/12/1991"),                       exit = c("27/06/1997", "23/05/1995", "24/07/1998"),                       fail = c(1, 0, 1) ),                     .Names = c("id", "birth", "entry", "exit", "fail"),                  row.names = c("1", "2", "3"),                      class = "data.frame" )# Convert the character dates into numerical variables (fractional years)xcoh <- cal.yr( xcoh, format="%d/%m/%Y", wh=2:4 )# See how it looksxcohstr( xcoh )# Define as Lexis object with timescales calendar time and ageLcoh <- Lexis( entry = list( per=entry ),                exit = list( per=exit, age=exit-birth ),         exit.status = fail,                data = xcoh )Lcohcbind( Lcoh, zz=3:5 )# Lexis object wit time since entry time scaleDcoh <- Lexis( entry = list( per=entry, tfe=0 ),                exit = list( per=exit ),         exit.status = fail,                data = xcoh )# A bit meningless to combie these two, really...rbind( Dcoh, Lcoh )# Split different placessL <- splitLexis( Lcoh, time.scale="age", breaks=0:20*5 )sD <- splitLexis( Dcoh, time.scale="tfe", breaks=0:50*2 )sDL <- rbind( sD, sL )

Generate a nested case-control study

Description

Given the basic outcome variables for a cohort study: the time of entryto the cohort, the time of exit and the reason for exit ("failure" or"censoring"), this function computes risk sets and generates a matchedcase-control study in which each case is compared with a set of controlsrandomly sampled from the appropriate risk set. Other variables may bematched when selecting controls.

Usage

ccwc( entry=0, exit, fail, origin=0, controls=1, match=list(),      include=list(), data=NULL, silent=FALSE )

Arguments

entry

Time of entry to follow-up

exit

Time of exit from follow-up

fail

Status on exit (1=Fail, 0=Censored)

origin

Origin of analysis time scale

controls

The number of controls to be selected for each case

match

List of categorical variables on which to match cases and controls

include

List of other variables to be carried across into the case-controlstudy

data

Data frame in which to look for input variables

silent

If FALSE, echos a . to the screen for each case-control setcreated; otherwise produces no output.

Value

The case-control study, as a dataframe containing:

Set

case-control set number

Map

row number of record in input dataframe

Time

failure time of the case in this set

Fail

failure status (1=case, 0=control)

These are followed by the matching variables, and finally by thevariables in theinclude list

Author(s)

David Clayton

References

Clayton and Hills, Statistical Models in Epidemiology, OxfordUniversity Press, Oxford:1993.

Examples

## For the diet and heart dataset, create a nested case-control study# using the age scale and matching on job#data(diet)dietcc <- ccwc( doe, dox, chd, origin=dob, controls=2, data=diet,                include=energy, match=job)

Compute cumulative risks and expected sojourn times from models forcause-specific rates.

Description

Consider a list of parametric models for rates of competing events, suchas different causes of death, A, B, C, say. From estimates of thecause-specific rates we can compute 1) the cumulative risk of being ineach state ('Surv' (=no event) and A, B and C) at different times, 2)the stacked cumulative rates such as A, A+C, A+C+Surv and 3) theexpected (truncated) sojourn times in each state up to each time point.

This can be done by simple numerical integration using estimates frommodels for the cause specific rates. But the standard errors of theresults are analytically intractable.

The functionci.Crisk computes estimates with confidenceintervals using simulated samples from the parameter vectors of suppliedmodel objects. Some call this a parametric bootstrap.

The times and other covariates determining the cause-specific rates mustbe supplied in a data frame which will be used for predicting rates forall transitions.

Usage

ci.Crisk(mods,           nd,         tnam = names(nd)[1],           nB = 1000,         perm = length(mods):0 + 1,        alpha = 0.05,       sim.res = 'none')

Arguments

mods

A named list ofglm/gam model objectsrepresenting the cause-specific rates. If the list is not named thefunction will crash. The names will be used as names for the states(competing risks), while the state without any event will be called"Surv".

nd

A data frame of prediction points and covariates to be usedon all models supplied inmods.

tnam

Name of the column innd which is the time scale.Itmust represent endpoints of equidistant intervals.

nB

Scalar. The number of simulations from the (posterior)distribution of the model parameters to be used in computingconfidence limits.

perm

Numerical vector of lengthlength(mods)+1 indicatingthe order in which states are to be stacked. The'Surv' stateis taken to be the first, the remaining in the reverse order suppliedin themods argument. The default is therefore to stack withthe survival as the first, which may not be what you normally want.

alpha

numeric. 1 minus the confidence level used in calculatingthe c.i.s

sim.res

Character. What simulation samples should bereturned. If'none' (the default) the function returns a listof 3 arrays (see under 'value'). If'rates' it returns anarray of dimensionnrow(nd) xlength(mod) xnBof bootstrap samples of the rates. If'crisk' it returns anarray of dimensionnrow(nd) xlength(mod)+1 xnB of bootstrap samples of the cumulative rates. Only thefirst letter matters, regardless of whether it is in upper lowercase.

Value

Ifsim.res='none' a named list with 4 components, the first 3are 3-way arrays classified by time, state and estimate/confidenceinterval:

Crisk Cumulative risks for thelength(mods)eventsand the survival
Srisk Stacked versions of the cumulative risks
Stime Sojourn times in each states
time Endpoints of intervals. It is just the numericalversion of the names of the first dimension of the three arrays

All three arrays have (almost) the same dimensions:

time, named astnam; endpoints of intervals. Lengthnrow(nd).
cause. The arraysCrisk andStime havevalues "Surv" plus the names of the listmods (firstargument).Srisk has lengthlength(mod), with eachlevel representing a cumulative sum of cumulative risks, in orderindicated by theperm argument.
Unnamed,ci.50%,ci.2.5%,ci.97.5%representing quantiles of the quantities derived from the bootstrapsamples. Ifalpha is different from 0.05, names are of coursedifferent.

Ifsim.res='rates' the function returns bootstrap samples ofrates for each cause as an arrayclassified by time, cause and bootstrap sample.

Ifsim.res='crisk' the function returns bootstrap samples ofcumulative risks for each cause (including survival) as an arrayclassified by time, state (= causes + surv) and bootstrap sample.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

library(Epi)data(DMlate)# A Lexis object for survivalLdm <- Lexis(entry = list( per = dodm,                           age = dodm-dobth,                            tfd = 0 ),              exit = list( per = dox ),       exit.status = factor( !is.na(dodth), labels = c("DM","Dead") ),              data = DMlate[sample(1:nrow(DMlate),1000),] )summary(Ldm, timeScales = TRUE)# Cut at OAD and Ins timesMdm <- mcutLexis(Ldm,                  wh = c('dooad','doins'),          new.states = c('OAD','Ins'),          seq.states = FALSE,                ties = TRUE)summary(Mdm$lex.dur)# restrict to DM state and splitSdm <- splitLexis(factorize(subset(Mdm, lex.Cst == "DM")),                  time.scale = "tfd",                  breaks = seq(0,20,1/12))summary(Sdm)summary(Relevel(Sdm, c(1, 4, 2, 3)))boxes(Relevel(Sdm, c(1, 4, 2, 3)),       boxpos = list(x = c(15, 85, 80, 15),                    y = c(85, 85, 20, 15)),      scale.R = 100)# glm models for the cause-specific ratessystem.time(mD <- glm.Lexis(Sdm, ~ Ns(tfd, knots=0:6*2), to = 'Dead') )system.time(mO <- glm.Lexis(Sdm, ~ Ns(tfd, knots=0:6*2), to = 'OAD' ) )system.time(mI <- glm.Lexis(Sdm, ~ Ns(tfd, knots=0:6*2), to = 'Ins' ) )# intervals for calculation of predicted ratesint <- 1 / 100nd <- data.frame(tfd = seq(0, 10, int)) # not the same as the split,                                         # and totally unrelated to it# cumulaive risks with confidence intervals# (too few timepoints, too few simluations)system.time(res <- ci.Crisk(list(OAD = mO,                      Ins = mI,                     Dead = mD),                            nd = data.frame(tfd = 0:100 / 10),                            nB = 100,                          perm = 4:1))str(res)

Compute cumulative sum of estimates.

Description

Computes the cumulative sum of parameter functions and thestandard error of it. Used for computing the cumulative rate, or thesurvival function based on aglm with parametric baseline.

Usage

ci.cum( obj,    ctr.mat = NULL,     subset = NULL,       intl = NULL,      alpha = 0.05,        Exp = TRUE,     ci.Exp = FALSE,     sample = FALSE )ci.surv( obj,    ctr.mat = NULL,     subset = NULL,       intl = NULL,      alpha = 0.05,        Exp = TRUE,     sample = FALSE )

Arguments

obj

A model object (of classlm,glm.

ctr.mat

Matrix or data frame.

Ifctr.mat is a matrix, it should be a contrast matrix to bemultiplied to the parameter vector, i.e. the desired linear function ofthe parameters.

If it is a data frame it should have columns corresponding to aprediction data frame forobj, see details forci.lin.

subset

Subset of the parameters of the model to which a matrixctr.mat should be applied.

intl

Interval length for the cumulation. Either a constant or anumerical vector of lengthnrow(ctr.mat). If omitted taken asthe difference between the two first elments if the first column inctr.mat, assuming that that holds the time scale. A note isissued in this case.

alpha

Significance level used when computing confidence limits.

Exp

Should the parameter function be exponentiated before it iscumulated?

ci.Exp

Should the confidence limits for the cumulative rate becomputed on the log-scale, thus ensuring that exp(-cum.rate) is alwaysin [0,1]?

sample

Should a sample of the original parameters be used tocompute a cumulative rate?

Details

The purpose ofci.cum is to the compute cumulative rate(integrated intensity) at a set of points based on a model for therates.ctr.mat is a matrix which, when premultiplied to theparameters of the model returns the (log)rates at a set of equidistanttime points. If log-rates are returned from the prediction method forthe model, the they should be exponentiated before cumulated, and thevariances computed accordingly. Since the primary use is for log-linearPoisson models theExp parameter defaults to TRUE.

Each row in the object supplied viactr.mat is assumed torepresent a midpoint of an interval.ci.cum will then return thecumulative rates at theend of these intervals.ci.survwill return the survival probability at thestart of each ofthese intervals, assuming the the first interval starts at 0 - the firstrow of the result isc(1,1,1).

Theci.Exp argument ensures that the confidence intervals for thecumulative rates are always positive, so that exp(-cum.rate) is alwaysin [0,1].

Value

A matrix with 3 columns: Estimate, lower and upper c.i. Ifsample is TRUE, a single sampled vector is returned, ifsample is numeric a matrix withsample columns isreturned, each column a cumulative rate based on a random sample fromthe distribution of the parameter estimates.

ci.surv returns a 3 column matrix with estimate, lower andupper confidence bounds for the survival function.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

# Packages required for this examplelibrary( splines )library( survival )data( lung )par( mfrow=c(1,2) )# Plot the Kaplan-meier-estimatorplot( survfit( Surv( time, status==2 ) ~ 1, data=lung ) )# Declare data as LexislungL <- Lexis(exit = list(tfd=time),               exit.status = (status == 2) * 1,               data = lung)summary(lungL)# Split the follow-up every 10 dayssL <- splitLexis(lungL, "tfd", breaks=seq(0,1100,10))summary(sL)# Fit a Poisson model with a natural spline for the effect of time (left# end points of intervals are used as covariate)mp <- glm(cbind(lex.Xst == 1, lex.dur)          ~ Ns(tfd,knots = c(0, 50, 100, 200, 400, 700)),          family = poisreg,            data = sL)# mp is now a model for the rates along the time scale tfd# prediction data frame for select time points on this time scalend <- data.frame(tfd = seq(5,995,10)) # *midpoints* of intervalsLambda <- ci.cum ( mp, nd, intl=10 )surv   <- ci.surv( mp, nd, intl=10 )# Put the estimated survival function on top of the KM-estimator# recall the ci.surv return the survival at *start* of intervalsmatshade(nd$tfd - 5, surv, col = "Red", alpha = 0.15)# Extract and plot the fitted intensity functionlambda <- ci.pred(mp, nd) * 365.25 # mortality matshade(nd$tfd, lambda, log = "y", ylim = c(0.2, 5), plot = TRUE,          xlab = "Time since diagnosis",          ylab = "Mortality per year")# same thing works with gam from mgcvlibrary(mgcv)mg <- gam(cbind(lex.Xst == 1, lex.dur) ~ s(tfd), family = poisreg, data=sL )matshade(nd$tfd - 5, ci.surv(mg, nd, intl=10), plot=TRUE,         xlab = "Days since diagnosis", ylab="P(survival)")matshade(nd$tfd  , ci.pred(mg, nd) * 365.25, plot=TRUE, log="y",         xlab = "Days since diagnosis", ylab="Mortality per 1 py")

Linear predictor (`eta`) from a formula, coefficients, vcov and aprediction frame.

Description

Computes the linear predictor with its confidence limits from themodel formula and the estimated parameters with the vcov.

Usage

ci.eta(form, cf, vcv, newdata,       name.check = TRUE,       alpha = 0.05, df = Inf, raw = FALSE)

Arguments

form

A model formula. A one-sided formula will suffice; leftside will be ignored if two-sided.

cf

Coefficients from a model usingformula.

vcv

variance-covariance matrix from a model usingformula.

newdata

Prediction data frame with variables used informula. Can also be a list of 2 or 4 prediction frames, fordetails seeci.lin.

name.check

Logical. Check if the column names of the genereatedmodel matrix are identical to the names of the suppliedcoef vector.

alpha

Significance level for calculation of c.i.

df

Integer. Number of degrees of freedom in the t-distributionused to compute the quantiles used to construct theconfidence intervals.

raw

Logical. Should predictions and their vcov be returnedinstead of predictions and confidence limits?

Details

Does pretty much the same asci.lin, but requires only aformula and coefficients with vcov and not a full modelobject. Designed to avoid saving entire (homongously large) modelobjects and still be able to compute predictions. But only the linearpredictor is returned, if there is a link in your model function it isyour own responsibility to back-transform. If the model formulacontains reference to vectors of spline knots or similar these must bein the global environment.

There is no guarantee that this function works for models that do notinherit fromlm. But there is a guarantee that it will not workforgam objects withs() terms.

Value

The linear predictor for thenewdata with a confidence intervalas anrow(newdata) by 3 matrix. Ifraw=TRUE, a list the linearpredictor (eta) and its variance-covariance matrix (var).

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Compute linear functions of parameters with standard errors andconfidence limits, optionally transforming to a different scale.

Description

For a given model object the function computes a linear function ofthe parameters and the corresponding standard errors, p-values andconfidence intervals.

Usage

ci.lin( obj,    ctr.mat = NULL,     subset = NULL,     subint = NULL,      xvars = NULL,      diffs = FALSE,       fnam = !diffs,       vcov = FALSE,      alpha = 0.05,         df = Inf,        Exp = FALSE,     sample = FALSE )ci.exp( ..., Exp = TRUE, pval = FALSE )Wald( obj, H0=0, ... )ci.mat( alpha = 0.05, df = Inf )ci.pred( obj,     newdata,         Exp = NULL,       alpha = 0.05 )ci.ratio( r1, r2,         se1 = NULL,         se2 = NULL,      log.tr = !is.null(se1) & !is.null(se2),       alpha = 0.05,        pval = FALSE )

Arguments

obj

A model object (in general of classglm, but forci.lin andci.exp it may also be of classlm,coxph,survreg,clogistic,cch,lme,mer,lmerMod,glmerMod,gls,nls,gnlm,MIresult,mipo,polr,orrq).

ctr.mat

Matrix, data frame or list (of two or four data frames).

Ifctr.mat is a matrix, it should be a contrast matrix to bemultiplied to the parameter vector, i.e. the desired linear functionof the parameters.

If it is a data frame it should have columns corresponding to aprediction frame, see details.

If it is a list, it must contain two or four data frames that are(possibly partial) prediction frames forobj, see argumentxvars and details.

xvars

Character vector. If quantitative variables in the modelare omitted from data frames supplied in a list toctr.mat,they should be listed here. Omitted factors need not be mentionedhere.

subset

The subset of the parameters to be used. If given as acharacter vector, the elements are in turn matched against theparameter names (usinggrep) to find the subset. Repeatparameters may result from using a character vector. This isconsidered a facility.

subint

Character.subset selection, but where eachelement of the character vector is used toselect a subset of parameters and only theintersectionof these is returned.

diffs

If TRUE, all differences between parametersin the subset are computed, and thesubset argument isrequired. The argumentctr.mat is ignored. Ifobjinherits fromlm, andsubset is given as a stringsubset is used to search among the factors in the model anddifferences of all factor levels for the first match are shown.Ifsubset does not match any of the factors in the model, allpairwise differences between parameters matching are returned.

fnam

Should the common part of the parameter names be includedwith the annotation of contrasts? Ignored ifdiffs==T. If astring is supplied this will be prefixed to the labels.

vcov

Should the covariance matrix of the set of parameters bereturned? If this is set,Exp is ignored. See details.

alpha

Significance level for the confidence intervals.

df

Integer. Number of degrees of freedom in the t-distribution usedto compute the quantiles used to construct the confidence intervals.

Exp

Forci.lin, ifTRUE columns 5:6 are replacedwith exp( columns 1,5,6 ). Forci.exp, ifFALSE, theuntransformed parameters are returned. Forci.pred itindicates whether the predictions should be exponentiated - thedefault (Exp=NULL) is to make a prediction with a Wald CI onthe scale of the linear predictor and back-transform it by theinverse link function; ifFALSE, the prediction on the linkscale is returned.

sample

Logical or numerical. IfTRUE or numerical asample of sizeas.numeric(sample) is drawn from themultivariate normal with mean equal to the (subset defined)coefficients and variance equal to the estimated variance-covarianceof these. These are then transformed byctr.mat andreturned.

pval

Logical. Should a column of P-values be included with theestimates and confidence intervals output byci.exp.

H0

Numeric. The null values for the selected/transformedparameters to be tested by a Wald test. Must have the same length asthe selected parameter vector.

...

Parameters passed on toci.lin.

newdata

Data frame of covariates where prediction is made.

r1,r2

Estimates of rates in two independent groups, withconfidence limits. Can be either 3-column matrices or data frameswith estimates and confidence intervals or 2 two column structureswith confidence limits only. Only the confidence limits are used.

se1,se2

Standard errors of log-rates in the two groups. Ifgiven, it is assumed thatr1 andr2 representlog-rates.

log.tr

Logical, if true, it is assumed thatr1 andr2 represent log-rates with confidence intervals.

Value

ci.lin returns a matrix with number of rows and row names asctr.mat. The columns are Estimate, Std.Err, z, P, 2.5% and97.5% (or according to the value ofalpha). Ifvcov=TRUE, instead a list of length 2 with componentscoef (avector), the desired functional of the parameters andvcov (asquare matrix), the variance covariance matrix of this, is returnedbut not printed. IfExp==TRUE the confidence intervals for theparameters are replaced with three columns: exp(estimate,c.i.).

ci.exp returns only the exponentiated parameter estimates withconfidence intervals. It is merely a wrapper forci.lin,fishing out the last 3 columns fromci.lin(...,Exp=TRUE). Ifyou just want the estimates and confidence limits, but notexponentiated, useci.exp(...,Exp=FALSE).

Ifctr.mat is a data frame, the model matrix corresponding tothis is constructed and supplied. This is only supported for objectsof classlm,glm,gam andcoxph.

So the default behaviour will be to produce the same asci.pred, apparently superfluous. The purpose of this is toallow the use of the argumentsvcov that produces thevariance-covariance matrix of the predictions, andsample thatproduces a sample of predictions using sampling from the multivariatenormal with mean equal to parameters and variance equal to thehessian.

Ifctr.mat is a list of two data frames, the difference of thepredictions from using the first versus the last as newdata argumentsto predict is computed. Columns that would be identical in the twodata frames can be omitted (see below), but names of numericalvariables omitted must be supplied in a character vectorxvars. Factors omitted need not be named.

If the second data frame has only one row, this is replicated to matchthe number of rows in the first. This facility is primarily aimed atteasing out RRs that are non-linear functions of a quantitativevariable without setting up contrast matrices using the same code asin the model. Note however if splines are used with computed knotsstored in a vector such asNs(x,knots=kk) then thekkmust be available in the (global) environment; it will not be foundinside the model object. In practical terms it means that if you savemodel objects for later prediction you should save the knots used inthe spline setups too.

Ifctr.mat is a list of four data frames, the difference of thedifference of predictions from using the first and second versusdifference of predictions from using the third and fourth is computed.Simply(pr1-pr2) - (pr3-pr4) with obvious notation. Useful toderive esoteric interaction effects.

Finally, only argumentsExp,vcov,alpha andsample fromci.lin are honored whenctr.mat is adata frame or a list of two data frames.

You can leave out variables (columns) from the two data frames thatwould be identical, basically variables not relevant for thecalculation of the contrast. In many casesci.lin (reallyEpi:::ci.dfr) can figure out the names of the omitted columns,but occasionally you will have to supply the names of the omittedvariables in thexvars argument. Factors omitted need not belisted inxvars, although no harm is done doing so.

Wald computes a Wald test for a subset of (possibly linearcombinations of) parameters being equal to the vector of nullvalues as given byH0. The selection of the subset ofparameters is the same as forci.lin. Using thectr.matargument makes it possible to do a Wald test for equality ofparameters.Wald returns a named numerical vector of length 3,with namesChisq,d.f. andP.

ci.mat returns a 2 by 3 matrix with rowsc(1,0,0) andc(0,-1,1)*1.96, devised to post-multiply to a p by 2 matrix withcolumns of estimates and standard errors, so as to produce a p by 3 matrixof estimates and confidence limits. Used internally inci.lin andci.cum.The 1.96 is replaced by the appropriate quantile from the normal ort-distribution when argumentsalpha and/ordf are given.

ci.pred returns a 3-column matrix with estimates and upper andlower confidence intervals as columns. This is just a conveniencewrapper forpredict.glm(obj,se.fit=TRUE) which returns a ratherunhandy structure. The prediction with c.i. is made in thelinkscale, and by default transformed by the inverse link, since the mostcommon use for this is for multiplicative Poisson or binomial modelswith either log or logit link.

ci.ratio returns the rate-ratio of two independent set ofrates given with confidence intervals or s.e.s. Ifse1 andse2 are given andlog.tr=FALSE it is assumed thatr1 andr2 are rates andse1 andse2 arestandard errors of the log-rates.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com & Michael Hills

Examples

# Bogus data:f <- factor( sample( letters[1:5], 200, replace=TRUE ) )g <- factor( sample( letters[1:3], 200, replace=TRUE ) )x <- rnorm( 200 )y <- 7 + as.integer( f ) * 3 + 2 * x + 1.7 * rnorm( 200 )# Fit a simple model:mm <- lm( y ~ x + f + g )ci.lin( mm )ci.lin( mm, subset=3:6, diff=TRUE, fnam=FALSE )ci.lin( mm, subset=3:6, diff=TRUE, fnam=TRUE )ci.lin( mm, subset="f", diff=TRUE, fnam="f levels:" )print( ci.lin( mm, subset="g", diff=TRUE, fnam="gee!:", vcov=TRUE ) )# Use character defined subset to get ALL contrasts:ci.lin( mm, subset="f", diff=TRUE )# Suppose the x-effect differs across levels of g:mi <- update( mm, . ~ . + g:x )ci.lin( mi )# RR a vs. b by x:nda <- data.frame( x=-3:3, g="a", f="b" )ndb <- data.frame( x=-3:3, g="b", f="b" )# ci.lin( mi, list(nda,ndb) )# Same result if f column is omitted because "f" columns are identicalci.lin( mi, list(nda[,-3],ndb[,-3]) )# However, crashes if knots in spline is supplied, and non-factor omittedxk <- -1:1xi <- c(-0.5,0.5)ww <- rnorm(200)mi <- update( mm, . ~ . -x + ww + Ns(x,knots=xk) + g:Ns(x,knots=xi) )# Will crash try( cbind( nda$x, ci.lin( mi, list(nda,ndb) ) ) )# Must specify num vars (not factors) omitted from nda, ndbcbind( nda$x, ci.lin( mi, list(nda,ndb), xvars="ww" ) )# A Wald test of whether the g-parameters are 0Wald( mm, subset="g" )# Wald test of whether the three first f-parameters are equal:( CM <- rbind( c(1,-1,0,0), c(1,0,-1,0)) )Wald( mm, subset="f", ctr.mat=CM )# or alternatively( CM <- rbind( c(1,-1,0,0), c(0,1,-1,0)) )Wald( mm, subset="f", ctr.mat=CM )# Confidence intervals for ratio of rates# Rates and ci supplied, but only the range (lower and upper ci) is usedci.ratio( cbind(10,8,12.5), cbind(5,4,6.25) )ci.ratio( cbind(8,12.5), cbind(4,6.25) )# Beware of the offset when making predictions with ci.pred and ci.exp## Not run: library( mgcv )data( mortDK )m.arg  <- glm( dt ~ age , offset=log(risk) , family=poisson, data=mortDK )m.form <- glm( dt ~ age + offset(log(risk)), family=poisson, data=mortDK )a.arg  <- gam( dt ~ age , offset=log(risk) , family=poisson, data=mortDK )a.form <- gam( dt ~ age + offset(log(risk)), family=poisson, data=mortDK )nd <- data.frame( age=60:65, risk=100 )round( ci.pred( m.arg , nd ), 4 )round( ci.pred( m.form, nd ), 4 )round( ci.exp ( m.arg , nd ), 4 )round( ci.exp ( m.form, nd ), 4 )round( ci.pred( a.arg , nd ), 4 )round( ci.pred( a.form, nd ), 4 )round( ci.exp ( a.arg , nd ), 4 )round( ci.exp ( a.form, nd ), 4 )nd <- data.frame( age=60:65 )try( ci.pred( m.arg , nd ) )try( ci.pred( m.form, nd ) )try( ci.exp ( m.arg , nd ) )try( ci.exp ( m.form, nd ) )try( ci.pred( a.arg , nd ) )try( ci.pred( a.form, nd ) )try( ci.exp ( a.arg , nd ) )try( ci.exp ( a.form, nd ) )## End(Not run)# The offset may be given as an argument (offset=log(risk))# or as a term (+offset(log)), and depending on whether we are using a# glm or a gam Poisson model and whether we use ci.pred or ci.exp to# predict rates the offset is either used or ignored and either# required or not; the state of affairs can be summarized as:##                     offset#                     -------------------------------------#                     usage                 required?#                     ------------------    ---------------                      # function  model     argument   term       argument   term# ---------------------------------------------------------# ci.pred   glm       used       used       yes        yes#           gam       ignored    used       no         yes#        # ci.exp    glm       ignored    ignored    no         yes#           gam       ignored    ignored    no         yes# ---------------------------------------------------------

Compute confidence limits for a difference of two independent proportions.

Description

The usual formula for the c.i. of at difference of proportions isinaccurate. Newcombe has compared 11 methods and method 10 in hispaper looks like a winner. It is implemented here.

Usage

ci.pd(aa, bb=NULL, cc=NULL, dd=NULL,     method = "Nc",      alpha = 0.05, conf.level=0.95,     digits = 3,      print = TRUE,detail.labs = FALSE )

Arguments

aa

Numeric vector of successes in sample 1. Can also be amatrix or array (see details).

bb

Successes in sample 2.

cc

Failures in sample 1.

dd

Failures in sample 2.

method

Method to use for calculation of confidence interval, see"Details".

alpha

Significance level

conf.level

Confidence level

print

Should an account of the two by two table be printed.

digits

How many digits should the result be rounded to if printed.

detail.labs

Should the computing of probability differences bereported in the labels.

Details

Implements method 10 from Newcombe(1998) (method="Nc") or fromAgresti & Caffo(2000) (method="AC").

aa,bb,cc anddd can be vectors.Ifaa is a matrix, the elements[1:2,1:2] are used, withsuccessesaa[,1:2]. Ifaa is a three-way table or array,the elementsaa[1:2,1:2,] are used.

Value

A matrix with three columns: probability difference, lower and upperlimit. The number of rows equals the length of the vectorsaa,bb,cc anddd or, ifaa is a 3-way matrix,dim(aa)[3].

Author(s)

Bendix Carstensen, Esa Laara.http://bendixcarstensen.com

References

RG Newcombe: Interval estimation for the difference betweenindependent proportions. Comparison of eleven methods. Statistics inMedicine, 17, pp. 873-890, 1998.

A Agresti & B Caffo: Simple and effective confidence intervals forproportions and differences of proportions result from adding twosuccesses and two failures. The American Statistician,54(4), pp. 280-288, 2000.

Examples

( a <- matrix( sample( 10:40, 4 ), 2, 2 ) )ci.pd( a )twoby2( t(a) )prop.test( t(a) )( A <- array( sample( 10:40, 20 ), dim=c(2,2,5) ) )ci.pd( A )ci.pd( A, detail.labs=TRUE, digits=3 )

Conditional logistic regression

Description

Estimates a logistic regression model by maximizing the conditionallikelihood. The conditional likelihood calculations are exact, andscale efficiently to strata with large numbers of cases.

Usage

clogistic(formula, strata, data, subset, na.action, init,model = TRUE, x = FALSE, y = TRUE, contrasts = NULL,iter.max=20, eps=1e-6, toler.chol = sqrt(.Machine$double.eps))

Arguments

formula

Model formula

strata

Factor describing membership of strata for conditioning

data

data frame containing the variables in the formula andstrata arguments

subset

subset of records to use

na.action

missing value handling

init

initial values

model

a logical value indicating whethermodel frameshould be included as a component of the returned value

x,y

logical values indicating whether the response vector and modelmatrix used in the fitting process should be returned as componentsof the returned value.

contrasts

an optional list. See thecontrasts.arg ofmodel.matrix.default

iter.max

maximum number of iterations

eps

Convergence tolerance. Iteration continues until the relativechange in the conditional log likelihood is less thaneps.Must be positive.

toler.chol

Tolerance used for detection of a singularity during a Choleskydecomposition of the variance matrix. This is used to detectredundant predictor variables. Must be less thaneps.

Value

An object of class"clogistic". This is a list containingthe following components:

coefficients

the estimates of the log-odds ratio parameters. If the model isover-determined there will be missing values in the vector correspondingto the redundant columns in the model matrix.

var

the variance matrix of the coefficients. Rows and columns corresponding to any missing coefficients are set to zero.

loglik

a vector of length 2 containing the log-likelihood with the initialvalues and with the final values of the coefficients.

iter

number of iterations used.

n

number of observations used. Observations may be droppedeither because they are missing, or because they belong to ahomogeneous stratum. For more details on which observations wereused, seeinformative below.

informative

ifmodel=TRUE, a logical vector of length equal to the numberof rows in the model frame. This indicates whether an observationis informative. Strata that are homogeneous with respect to eitherthe outcome variable or the predictor variables are uninformative,in the sense that they can be removed without modifying theestimates or standard errors. Ifmodel=FALSE, this is NULL.

The output will also contain the following, for documentation see theglm object:terms,formula,call,contrasts,xlevels, and, optionally,x,y, and/orframe.

Author(s)

Martyn Plummer

Examples

  data(bdendo)  clogistic(d ~ cest + dur, strata=set, data=bdendo)

Contrast matrices

Description

Return a matrix of contrasts for factor coding.

Usage

  contr.cum(n)  contr.diff(n)  contr.2nd(n)  contr.orth(n)

Arguments

n

A vector of levels for a factor, or the number of levels.

Details

These functions are used for creating contrast matrices for use infitting regression models. The columns of theresulting matrices contain contrasts which can be used for coding afactor withn levels.

contr.cum gives a coding corresponding to successive differencesbetween factor levels.

contr.diff gives a coding that correspond to the cumulative sumof the value for each level. This is not meaningful in a model where theintercept is included, thereforen columns ia always returned.

contr.2nd gives contrasts corresponding to 2nd order differencesbetween factor levels. Returns a matrix withn-2 columns.

contr.orth gives a matrix withn-2 columns, which aremutually orthogonal and orthogonal to the matrixcbind(1,1:n)

Value

A matrix withn rows andk columns, withk=n forcontr.diffk=n-1 forcontr.cumk=n-2 forcontr.2nd andcontr.orth.

Author(s)

Bendix Carstensen

Examples

contr.cum(6)contr.2nd(6)contr.diff(6)contr.orth(6)

Fit a competing risks regression model (Fine-Gray model) using aLexis object)

Description

Fits a competing risks regression model using aLexisobject assuming that every person enters at time 0 and exits at timelex.dur. Thus is only meaningful for Lexis objects with one recordper person, (so far).

Usage

crr.Lexis( obj, mod, quiet=FALSE, ...)

Arguments

obj

A Lexis object; variables inmod are taken from this.

mod

Formula, with the l.h.s. a character constant equal to alevel ofobj$lex.Xst, and the r.h.s. a model formulainterpreted inobj.

quiet

Logical indicating whether a brief summary should be printed.

...

Further arguments passed on tocrr.

Details

This function is a simple wrapper forcrr, allowing aformula-specification of the model (which allows specifications ofcovariates on the fly), and utilizing the structure of Lexisobjects to simplify specification of the outcome. Prints a summary ofthe levels used as event, competing events and censoring.

By the structure of theLexis object it is not necessaryto indicate what the censoring code or competing events are, that isautomatically derived from theLexis object.

Currently only one state is allowed as l.h.s. (response) inmod.

Value

Acrr object (which is a list), with two extraelements in the list,model.Lexis - the model formula supplied,andtransitions - a table of transitions and censorings showingwhich transition was analysed and which were taken as competing events.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

# Thorotrats patients, different histological types of liver cancer# Load thorotrast data, and restrict to exposeddata(thoro)tht <- thoro[thoro$contrast==1,]# Define exitdate as the date of livercancertht$dox <- pmin( tht$liverdat, tht$exitdat, na.rm=TRUE )tht <- subset( tht, dox > injecdat )# Convert to calendar years in datestht <- cal.yr( tht )# Set up a Lexis object with three subtypes of liver cancer and deaththt.L <- Lexis( entry = list( per = injecdat,                              tfi = 0 ),                 exit = list( per = dox ),          exit.status = factor( 1*hepcc+2*chola+3*hmang+                                4*(hepcc+chola+hmang==0 & exitstat==1),                                labels=c("No cancer","hepcc","chola","hmang","Dead") ),                 data = tht )summary( tht.L )# Show the transitionsboxes( tht.L, boxpos=list(x=c(20,rep(80,3),30),                          y=c(60,90,60,30,10) ),              show.BE="nz", scale.R=1000 )# Fit a model for the Hepatocellular Carcinoma as outcome# - note that you can create a variable on the fly:library( cmprsk )hepcc <- crr.Lexis( tht.L, "hepcc" ~ volume + I(injecdat-1940) )hepcc$model.Lexishepcc$transitions# Models for the three other outcomes:chola <- crr.Lexis( tht.L, "chola" ~ volume + I(injecdat-1940) )hmang <- crr.Lexis( tht.L, "hmang" ~ volume + I(injecdat-1940) )dead  <- crr.Lexis( tht.L, "Dead"  ~ volume + I(injecdat-1940) )# Compare the effects# NOTE: This is not necessarily a joint model for all transitions.zz <- rbind( ci.exp(hepcc),             ci.exp(chola),             ci.exp(hmang),             ci.exp(dead) )zz <- cbind( zz[c(1,3,5,7)  ,],             zz[c(1,3,5,7)+1,] )rownames( zz ) <- c("hepcc","chola","hmang","dead")colnames( zz )[c(1,4)] <- rownames( ci.exp(chola) )round( zz, 3 )

Cut follow-up at a specified date for each person.

Description

Follow-up intervals in a Lexis object are divided into twosub-intervals: one before and one after an intermediate event. Theintermediate event may denote a change of state, in which case theentry and exit status variables in the split Lexis object aremodified.

Usage

cutLexis( data, cut, timescale = 1,                     new.state = nlevels(data$lex.Cst)+1,                     new.scale = FALSE,                  split.states = FALSE,                   progressive = FALSE,              precursor.states = transient(data),                         count = FALSE )countLexis( data, cut, timescale = 1 )

Arguments

data

ALexis object.

cut

A numeric vector with the times of the intermediate event.If a time is missing (NA) then the event is assumed to occurat timeInf.cut can also be a dataframe, see details.

timescale

The timescale thatcut refers to. Numeric or character.

new.state

The state to which a transition occur at timecut. It may be a single value, which is then applied to allrows ofdata, or a vector with a separate value for each row

new.scale

Name of the timescale defined as "time since entry tonew.state". IfTRUE a name for the new scale is constructed.See details.

split.states

Should states that are not precursor states be splitaccording to whether the intermediate event has occurred.

progressive

a logical flag that determines the changes to exitstatus. See details.

precursor.states

an optional vector of states to be consideredas "less severe" thannew.state. See Details below

count

logical indicating whether thecountLexis options shouldbe used. Specifyingcount=TRUE amounts to callingcountLexis,in which case the argumentsnew.state,progressive andprecursor.states will be ignored.

Details

ThecutLexis function allows a number of different waysof specifying the cutpoints and of modifying the status variable.

If thecut argument is a dataframe it must have columnslex.id,cut andnew.state. The values oflex.id must be unique.In this case it is assumed that each row represents a cutpoint (on thetimescale indicated in the argumenttimescale). This cutpoint willbe applied to all records indata with the correspondinglex.id.This makes it possible to applycutLexis to a splitLexis object.

If anew.state argument is supplied, the status variable isonly modified at the time of the cut point. However, it is oftenuseful to modify the status variable after the cutpoint when animportant event occurs. There are three distinct ways of doing this.

If theprogressive=TRUE argument is given, then a "progressive"model is assumed, in which the status can either remain the same orincrease during follow-up, but never decrease. This assumes that thestate variableslex.Cst andlex.Xst are either numeric orordered factors. In this case, ifnew.state=X, then any exit status with a value less thanX is replaced withX. The Lexis objectmust already be progressive, so that there are no rows for which theexit status is less than the entry status. Iflex.Cst andlex.Xst are factors they must be ordered factors ifprogressive=TRUE is given.

As an alternative to theprogressive argument, an explicitvector of precursor states, that are considered less severe than thenew state, may be given. Ifnew.state=X andprecursor.states=c(Y,Z) then any exit status ofY orZ in the second interval is replaced withX and allother values for the exit status are retained.

ThecountLexis function is a variant ofcutLexis whenthe cutpoint marks a recurrent event, and the status variable is usedto count the number of events that have occurred. Times given incutrepresent times of new events. Splitting withcountLexis increases the status variable by 1. If the currentstatus isX and the exit status isY before cutting,then after cutting the entry status isX,X+1 forthe first and second intervals, respectively, and the exit status isX+1,Y+1 respectively. Moreover the values of the statusis increased by 1 for all intervals for all intervals after the cutfor the person in question. Hence, a call tocountLexis isneeded for as many times as the person with most events. But also itis immaterial in what order the cutpoints are entered.

Value

ALexis object, for which each follow-up interval containingthe cutpoint is split in two: one before and one after thecutpoint. Any record representing follow up after the cutpoint has itsvalue oflex.Cst updated to the new state. An extra time-scaleis added; the time since the event atcut. This time scale willbeNA for any follow-up prior to the intermediate event.

The functiontsNA20 will replace all missing values intimescales with 0. This is commonly meeded when timescales defined astime since entry into an intermediate state are used in modeling. Butyou do not want to do that permanently in the cut data frame.

Note

ThecutLexis function superficially resembles thesplitLexis function. However, thesplitLexis functionsplits on a vector of common cut-points for all rows of the Lexisobject, whereas thecutLexis function splits on a single timepoint, which may be distinct for each row, modifies the statusvariables, adds a new timescale and updates the attribute"time.since". This attribute is a character vector of the same lengthas the "time.scales" attribute, whose value is '""' if thecorresponding timescale is defined for any piece of follow-up, and ifthe corresponding time scale is defined by saycutLexis(obj,new.state="A",new.scale=TRUE), it has the value"A".

Author(s)

Bendix Carstensen, Steno Diabetes Center,b@bxc.dk,Martyn Plummer,martyn.plummer@r-project.org

Examples

# A small artificial examplexx <- Lexis( entry=list(age=c(17,24,33,29),per=c(1920,1933,1930,1929)),             duration=c(23,57,12,15), exit.status=c(1,2,1,2) )xxcut <- c(33,47,29,50)cutLexis(xx, cut, new.state=3, precursor=1)cutLexis(xx, cut, new.state=3, precursor=2)cutLexis(xx, cut, new.state=3, precursor=1:2)# The same as the last examplecutLexis(xx, cut, new.state=3)# The same example with a factor status variableyy <- Lexis(entry = list(age=c(17,24,33,29),per=c(1920,1933,1930,1929)),            duration = c(23,57,12,15),            entry.status = factor(rep("alpha",4),            levels=c("alpha","beta","gamma")),            exit.status = factor(c("alpha","beta","alpha","beta"),            levels=c("alpha","beta","gamma")))cutLexis(yy,c(33,47,29,50),precursor="alpha",new.state="gamma")cutLexis(yy,c(33,47,29,50),precursor=c("alpha","beta"),new.state="aleph")## Using a dataframe as cut argumentrl <- data.frame( lex.id=1:3, cut=c(19,53,26), timescale="age", new.state=3 )rlcutLexis(xx, rl )cutLexis(xx, rl, precursor = 1)cutLexis(xx, rl, precursor = 0:2)## It is immaterial in what order splitting and cutting is donexs <- splitLexis( xx, breaks=seq(0,100,10), time.scale="age")xsxsC <- cutLexis(xs, rl, precursor = 0 )xC <- cutLexis(xx, rl, pre = 0)xCxCs <- splitLexis(xC, breaks=seq(0, 100, 10), time.scale = "age")xCsstr(xCs)

Projection of a model matrix on the orthogonalcomplement of a trend or curvature.

Description

The columns of a model matrixM is projected on theorthogonal complement to the matrix(1,t),resp.(1,t,t^2).

Orthogonality is w.r.t. an inner product defined by the positivedefinite matrix matrixdiag(weight). Non-diagonal matricesdefining the inner product is not supported.

Usage

  detrend( M, t, weight = rep(1, nrow(M)) )  decurve( M, t, weight = rep(1, nrow(M)) )

Arguments

M

A model matrix.

t

The trend defining a subspace. A numerical vector of lengthnrow(M).

weight

Weights defining the inner product of vectorsxandy assum(x*w*y).A numerical vector of lengthnrow(M), defaults to a vector of1s. Must be all non-negative.

Details

The functions are intended to be used in construction of particularparametrizations of age-period-cohort models.

Value

detrend returns full-rank matrix with columns orthogonal to(1,t);decurve returns full-rank matrix with columns orthogonal to(1,t,t^2).

Author(s)

Bendix Carstensen, Steno Diabetes Center Copenhagen,http://bendixcarstensen.com, with essential help from Peter Dalgaard.

Diet and heart data

Description

Thediet data frame has 337 rows and 14 columns.The data concern a subsample of subjects drawn from larger cohortstudies of the incidence of coronary heart disease (CHD). These subjectshad all completed a 7-day weighed dietary survey while taking part invalidation studies of dietary questionnaire methods. Upon the closure ofthe MRC Social Medicine Unit, from where these studies were directed, itwas found that 46 CHD events had occurred in this group, thus allowing aserendipitous study of the relationship between diet and the incidenceof CHD.

Format

This data frame contains the following columns:

`id`:	subject identifier, a numeric vector.
`doe`:	date of entry into follow-up study, a`Date` variable.
`dox`:	date of exit from the follow-up study, a`Date` variable.
`dob`:	date of birth, a`Date` variable.
`y`:	number of years at risk, a numeric vector.
`fail`:	status on exit, a numeric vector (codes 1, 3 and13 represent CHD events)
`job`:	occupation, a factor with levels`DriverConductorBank worker`
`month`:	month of dietary survey, a numeric vector
`energy`:	total energy intake (kCal per day/100), a numericvector
`height`:	(cm), a numeric vector
`weight`:	(kg), a numeric vector
`fat`:	fat intake (10 g/day), a numeric vector
`fibre`:	dietary fibre intake (10 g/day), a numeric vector
`energy.grp`:	high daily energy intake, a factor with levels`<=2750 KCal>2750 KCal`
`chd`:	CHD event, a numeric vector (1=CHD event, 0=no event)

Source

The data are described and used extensively by Clayton and Hills,Statistical Models in Epidemiology, Oxford University Press,Oxford:1993. They were rescued from destruction by David Clayton andreentered from paper printouts.

Examples

data(diet)# Illustrate the follow-up in a Lexis diagramLexis.diagram( age=c(30,75), date=c(1965,1990),               entry.date=cal.yr(doe), exit.date=cal.yr(dox), birth.date=cal.yr(dob),                fail=(fail>0), pch.fail=c(NA,16), col.fail=c(NA,"red"), cex.fail=1.0,               data=diet )

Function to calculate effects

Description

The function calculates the effects of an exposure on a response,possibly stratified by a stratifying variable, and/or controlled for oneor more confounding variables.

Usage

effx( response, type = "metric",         fup = NULL,    exposure,      strata = NULL,     control = NULL,             weights = NULL,                 eff = NULL,       alpha = 0.05,        base = 1,      digits = 3,        data = NULL )

Arguments

response

Theresponse variable - must be numeric orlogical. If logical,TRUE is considered the outcome.

type

The type of responsetype - must be one of "metric","binary", "failure", or "count"

fup

Thefup variable contains the follow-up time for afailure response. This must be numeric.

exposure

Theexposure variable can be numeric or a factor

strata

Thestrata stratifying variable - must be a factor

control

Thecontrol variable(s) (confounders) - these arepassed as a list if there are more than one.

weights

Frequency weights for binary response only

eff

How should effects be measured. Ifresponse isbinomial, the default is "OR" (odds-ratio) with "RR" (relative risk)as an option. Ifresponse is failure, the default is "RR"(rate-ratio) with "RD" (rate difference) as an option.

base

Baseline for the effects of a categorical exposure, either anumber or a name of the level. Defaults to 1

digits

Number of significant digits for the effects, default 3

alpha

1 - confidence level

data

data refers to the data used to evaluate the function

Details

The function is a wrapper for glm. Effects are calculated asdifferences in means for a metric response, odds ratios/relative risksfor a binary response, and rate ratios/rate differences for a failureor count response.

The k-1 effects for a categorical exposure with k levels are relativeto a baseline which, by default, is the first level. The effect of ametric (quantitative) exposure is calculated per unit of exposure.

The exposure variable can be numeric or a factor, but if it is anordered factor the order will be ignored.

Value

comp1

Effects of exposure

comp2

Tests of significance

Author(s)

Michael Hills (*1934-Jun-07, +2021-Jan-07)

Examples

library(Epi)data(births)births$hyp <- factor(births$hyp,labels=c("normal","hyper"))births$sex <- factor(births$sex,labels=c("M","F"))# bweight is the birth weight of the baby in gms, and is a metric# response (the default)# effect of hypertension on birth weighteffx(bweight,exposure=hyp,data=births)# effect of hypertension on birth weight stratified by sexeffx(bweight,exposure=hyp,strata=sex,data=births)# effect of hypertension on birth weight controlled for sexeffx(bweight,exposure=hyp,control=sex,data=births)print( options('na.action') )# effect of gestation time on birth weighteffx(bweight,exposure=gestwks,data=births)# effect of gestation time on birth weight stratified by sexeffx(bweight,exposure=gestwks,strata=sex,data=births)# effect of gestation time on birth weight controlled for sexeffx(bweight,exposure=gestwks,control=sex,data=births)# lowbw is a binary response coded 1 for low birth weight and 0 otherwise# effect of hypertension on low birth weighteffx(lowbw,type="binary",exposure=hyp,data=births)effx(lowbw,type="binary",exposure=hyp,eff="RR",data=births)

Function to calculate effects for individually matched case-control studies

Description

The function calculates the effects of an exposure on a response,possibly stratified by a stratifying variable, and/or controlled for oneor more confounding variables.

Usage

effx.match(response,exposure,match,strata=NULL,control=NULL,base=1,digits=3,alpha=0.05,data=NULL)

Arguments

response

Theresponse variable - must be numeric

exposure

Theexposure variable can be numeric or a factor

match

The variable which identifies the matched sets

strata

Thestrata stratifying variable - must be a factor

control

Thecontrol variable(s). These are passed as alist if there are more than one of them.

base

Baseline for the effects of a categorical exposure, default 1

digits

Number of significant digits for the effects, default 3

alpha

1 - confidence level

data

data refers to the data used to evaluate the function

Details

Effects are calculated odds ratios.The function is a wrapper for clogit, from the survival package.The k-1 effects for a categorical exposure with k levels are relative to a baseline which, by default, is the first level. The effect of a metric (quantitative) exposure is calculated per unit of exposure.The exposure variable can be numeric or a factor, but if it is an ordered factor the order will be ignored.

Value

comp1

Effects of exposure

comp2

Tests of significance

Author(s)

Michael Hills

References

www.mhills.pwp.blueyonder.co.uk

Examples

library(Epi)library(survival)data(bdendo)# d is the case-control variable, set is the matching variable.# The variable est is a factor and refers to estrogen use (no,yes)# The variable hyp is a factor with 2 levels and refers to hypertension (no, yes)# effect of est on the odds of being a caseeffx.match(d,exposure=est,match=set,data=bdendo)# effect of est on the odds of being a case, stratified by hypeffx.match(d,exposure=est,match=set,strata=hyp,data=bdendo)# effect of est on the odds of being a case, controlled for hypeffx.match(d,exposure=est,match=set,control=hyp,data=bdendo)

Time series and other methods for Lexis objects

Description

Extract the entry time, exit time, status or duration of follow-up from aLexis object. Classify states.

Usage

     entry(x, time.scale = NULL, by.id=FALSE)      exit(x, time.scale = NULL, by.id=FALSE)    status(x, at="exit"        , by.id=FALSE)       dur(x,                    by.id=FALSE) transient(x) absorbing(x) preceding(x, states)    before(x, states)succeeding(x, states)     after(x, states)

Arguments

x

an object of classLexis.

time.scale

a string or integer indicating the time scale. Ifomitted, all times scales are used.

by.id

Logical, ifTRUE, only one record per unique valueoflex.id is returned; either the first, the last, or fordur, the sum oflex.dur. IfTRUE, the returnedobject have thelex.id as (row)names attribute.

at

string indicating the time point(s) at which status is to bemeasured. Possible values are "exit" or "entry".

states

Character vector of states.

Value

Theentry andexit functions return a vector ofentry times and exit times, respectively, on the requested timescale. If multiple time scales are requested, a matrix isreturned.

Thestatus function returns a vector giving the status at"at" (either 'entry' or 'exit') anddurreturns a vector with the lengths of the follow-up intervals.

entry,exit,status anddur return vectorsof lengthnrow(x) ifby.id=FALSE; ifby.id=TRUE avector of lengthlength(unique(lex.id)).

The functionstransient andabsorbing return charactervectors of the transient, resp. absorbing states inx. Theseare necessarily disjoint but the union may be a proper subset oflevels(x), since the latter may have levels that are neverassumed by eitherlex.Cst orlex.Xst.

preceding returns a character vector with names of the statesof the Lexis objectx from which one of the states instates can be reached directly - the precedingstates.before is just a synonym forpreceding.

succeeding returns a character vector with names of the statesof the Lexis objectx that can be reached directly from one ofthe states instates.after is just a synonym forsucceeding.

Author(s)

Martyn Plummer & Bendix Carstensen

Compute survival functions from rates and expected residuallifetime in an illness-death model as well as years of life lost to disease.

Description

These functions compute survival functions from a set of mortality anddisease incidence rates in an illness-death model. Expected residuallife time can be computed under various scenarios by theerlfunction, and areas between survival functions can be computed undervarious scenarios by theyll function. Rates are assumedsupplied for equidistant intervals of lengthint.

Usage

  surv1( int, mu ,                age.in = 0, A = NULL )   erl1( int, mu ,                age.in = 0 )   surv2( int, muW, muD, lam,      age.in = 0, A = NULL )    erl( int, muW, muD, lam=NULL, age.in = 0, A = NULL,         immune = is.null(lam), yll=TRUE, note=TRUE )    yll( int, muW, muD, lam=NULL, age.in = 0, A = NULL,         immune = is.null(lam), note=TRUE )

Arguments

int

Scalar. Length of intervals that rates refer to.

mu

Numeric vector of mortality rates at midpoints of intervals of lengthint

muW

Numeric vector of mortality rates among persons in the "Well" state atmidpoints of intervals of lengthint. Left endpoint of firstinterval isage.in.

muD

Numeric vector of mortality rates among persons in the "Diseased" stateat midpoints of intervals of lengthint. Left endpoint of firstinterval isage.in.

lam

Numeric vector of disease incidence rates among persons in the "Well" stateat midpoints of intervals of lengthint. Left endpoint of firstinterval isage.in.

age.in

Scalar indicating the age at the left endpoint of the first interval.

A

Numeric vector of conditioning ages for calculation of survivalfunctions.

immune

Logical. Should the years of life lost to the disease be computedusing assumptions that non-diseased individuals are immune to thedisease (lam=0) and that their mortality is yet stillmuW.

note

Logical. Should a warning of silly assumptions be printed?

yll

Logical. Should years of life lost be included in the result?

Details

The mortality rates given are supposed to refer to the agesage.in+(i-1/2)*int,i=1,2,3,....

The units in whichint is given must correspond to the units inwhich the ratesmu,muW,muD andlam aregiven. Thus ifint is given in years, the rates must be givenin the unit of events per year.

The ages in which the survival curves are computed are fromage.in and then at the end oflength(muW)(length(mu)) intervals each of lengthint.

Theage.in argument is merely a device to account for ratesonly available from a given age. It has two effects, one is thatlabeling of the interval endpoint is offset by this quantity, thusstarting atage.in, and the other that the conditioning agesgiven in the argumentA will refer to the ages defined by this.

Theimmune argument isFALSE whenever the diseaseincidence rates are supplied. If set toTRUE, the years of lifelost is computed under the assumption that individuals without thedisease at a given age are immune to the disease in the sense that thedisease incidence rate is 0, so transitions to the diseased state(with presumably higher mortality rates) are assumed not tooccur. This is a slightly peculiar assumption (but presumably the mostused in the epidemiological literature) and the resulting object istherefore given an attribute,NOTE, that point this out.

If howevermuW is the total mortality in the population(including the diseased) the result is a good approximation to thecorrect YLL.

The default of thesurv2 function is to take the possibility ofdisease into account.

Value

surv1 andsurv2 return a matrix whose first columnis the ages at the ends of the intervals, thus withlength(mu)+1 rows. The following columnsare the survival functions (sinceage.in), and conditional onsurvival till ages as indicated inA, thus a matrix withlength(A)+2 columns. Columns are labeled with the actualconditioning ages; ifA contains values that are not among theendpoints of the intervals used, the nearest smaller interval borderis used as conditioning age, and columns are named accordingly.

surv1 returns the survival function for a simple model with onetype of death, occurring at intensitymu.

surv2 returns the survival function for a person in the "Well"state of an illness-death model, taking into account that the personmay move to the "Diseased" state, thus requiring all three transitionrates to be specified. The conditional survival functions areconditional on being in the "Well" state at ages given inA.

erl1 returns a three column matrix with columnsage,surv (survival function) anderl (expected residual lifetime) withlength(mu)+1 rows.

erl returns a two column matrix, columns labeled "Well" and"Dis", and with row-labelsA. The entries are the expectedresidual life times given survival toA. Ifyll=TRUE thedifference between the columns is added as athird column, labeled "YLL".

Author(s)

Bendix Carstensen,b@bxc.dk

Examples

library( Epi )data( DMlate )# Naive Lexis objectLx <- Lexis( entry = list( age = dodm-dobth ),              exit = list( age = dox -dobth ),       exit.status = factor( !is.na(dodth), labels=c("DM","Dead") ),              data = DMlate )# Cut follow-up at insulin inceptionLc <- cutLexis( Lx, cut = Lx$doins-Lx$dob,              new.state = "DM/ins",       precursor.states = "DM" )summary( Lc )# Split in small age intervalsSc <- splitLexis( Lc, breaks=seq(0,120,2) )summary( Sc )# Overview of objectboxes( Sc, boxpos=TRUE, show.BE=TRUE, scale.R=100 )# Knots for splinesa.kn <- 2:9*10# Mortality among DMmW <- glm( lex.Xst=="Dead" ~ Ns( age, knots=a.kn ),           offset = log(lex.dur),           family = poisson,             data = subset(Sc,lex.Cst=="DM") )# Mortality among insulin treatedmI <- update( mW, data = subset(Sc,lex.Cst=="DM/ins") )# Total motalitymT <- update( mW, data = Sc )# Incidence of insulin inceptionlI <- update( mW, lex.Xst=="DM/ins" ~ . )# From these we can now derive the fitted rates in intervals of 1 year's# length. In real applications you would use much smaller interval like# 1 month:# int <- 1/12 int <- 1# Prediction frame to return rates in units of cases per 1 year# - we start at age 40 since rates of insulin inception are largely# indeterminate before age 40nd <- data.frame( age = seq( 40+int, 110, int ) - int/2,              lex.dur = 1 )muW <- predict( mW, newdata = nd, type = "response" )muD <- predict( mI, newdata = nd, type = "response" )lam <- predict( lI, newdata = nd, type = "response" )# Compute the survival function, and the conditional from ages 50 resp. 70s1 <- surv1( int, muD, age.in=40, A=c(50,70) )round( s1, 3 )s2 <- surv2( int, muW, muD, lam, age.in=40, A=c(50,70) )round( s2, 3 )# How much is YLL overrated by ignoring insulin incidence?round( YLL <- cbind(yll( int, muW, muD, lam, A = 41:90, age.in = 40 ),yll( int, muW, muD, lam, A = 41:90, age.in = 40, immune=TRUE ) ), 2 )[seq(1,51,10),]par( mar=c(3,3,1,1), mgp=c(3,1,0)/1.6, bty="n", las=1 )matplot( 40:90, YLL,         type="l", lty=1, lwd=3,         ylim=c(0,10), yaxs="i", xlab="Age" )

Rates of lung and nasal cancer mortality, and total mortality.

Description

England and Wales mortality rates from lung cancer, nasal cancer,and all causes 1936 - 1980. The 1936 rates are repeated as 1931 rates inorder to accommodate follow up for thenickel study.

Usage

data(ewrates)

Format

A data frame with 150 observations on the following 5 variables:

`id`:	Subject identifier (numeric)
`year`	Calendar period, 1931: 1931--35, 1936: 1936--40, ...
`age`	Age class: 10: 10--14, 15:15--19, ...
`lung`	Lung cancer mortality rate per 1,000,000 py.
`nasal`	Nasal cancer mortality rate per 1,000,000 py.
`other`	All cause mortality rate per 1,000,000 py.

Source

From Breslow and Day, Vol II, Appendix IX.

Examples

data(ewrates)str(ewrates)

Function to expand data for regression analysis of interval censoreddata.

Description

This is a utility function.

The original records withfirst.well,last.well andfirst.ill areexpanded to multiple records; several for each interval where theperson is known to be well and one where the person is known to fail.At the same time columns for the covariates needed to estimate ratesand the response variable are generated.

Usage

expand.data(fu, formula, breaks, data)

Arguments

fu

A 3-column matrix withfirst.well,last.well andfirst.ill in each row.

formula

Model fromula, used to derive the model matrix.

breaks

Defines the intervals in which the baseline rate isassumed constant. All follow-up before the first and after thelast break is discarded.

data

Datafrem in whichfu andformula is interpreted.

Value

Returns a list with three components

rates.frame

Dataframe of covariates for estimation of thebaseline rates — one per interval defined bybreaks.

cov.frame

Dataframe for estimation of the covariate effects. Adata-framed version of the designmatrix fromformula.

y

Response vector.

Author(s)

Martyn Plummer,martyn.plummer@r-project.org

References

B Carstensen: Regression models for interval censoredsurvival data: application to HIV infection in Danish homosexualmen. Statistics in Medicine, 15(20):2177-2189, 1996.

Fit an additive excess risk model to interval censored data.

Description

Utility function.

The model fitted assumes a piecewise constant intensity for thebaseline, and that the covariates act additively on the rate scale.

Usage

  fit.add( y, rates.frame, cov.frame, start )

Arguments

y

Binary vector of outcomes

rates.frame

Dataframe expanded from the original data byexpand.data, cooresponding to covariates for the rateparameters.

cov.frame

do., but covariates corresponding to theformula argument ofIcens

start

Starting values for the rate parameters. If not supplied,then starting values are generated.

Value

A list with one component:

rates

A glm object from a binomial model with log-link function.

Author(s)

Martyn Plummer,martyn.plummer@r-project.org

References

B Carstensen: Regression models for interval censoredsurvival data: application to HIV infection in Danish homosexualmen. Statistics in Medicine, 15(20):2177-2189, 1996.

CP Farrington: Interval censored survival data: a generalized linearmodelling approach. Statistics in Medicine, 15(3):283-292, 1996.

Examples

  data( HIV.dk )

Fit a piecewise contsnt intesity model for interval censored data.

Description

Utility function

Fits a binomial model with logaritmic link, withy as outcomeand covariates inrates.frame to estimate rates in theinttervals betweenbreaks.

Usage

fit.baseline( y, rates.frame, start )

Arguments

y

Binary vector of outcomes

rates.frame

Dataframe expanded from the original data byexpand.data

start

Starting values for the rate parameters. If not supplied,then starting values are generated.

Value

Aglm object, with binomial error and logaritmic link.

Author(s)

Martyn Plummer,martyn.plummer@r-project.org

Fits a multiplicative relative risk model to interval censored data.

Description

Utility function.

The model fitted assumes a piecewise constant baseline rate inintervals specified by the argumentbreaks, and amultiplicative relative risk function.

Usage

  fit.mult( y, rates.frame, cov.frame, start )

Arguments

y

Binary vector of outcomes

rates.frame

Dataframe expanded from the original data byexpand.data, cooresponding to covariates for the rateparameters.

cov.frame

do., but covariates corresponding to theformula argument ofIcens

start

Starting values for the rate parameters. If not supplied,then starting values are generated.

Details

The model is fitted by alternating between two generalized linearmodels where one estimates the underlying rates in the intervals, andthe other estimates the log-relative risks.

Value

A list with three components:

rates

A glm object from a binomial model with log-link,estimating the baseline rates.

cov

A glm object from a binomial model with complementarylog-log link, estimating the log-rate-ratios

niter

Nuber of iterations, a scalar

Author(s)

Martyn Plummer,martyn.plummer@r-project.org,Bendix Carstensen,b@bxc.dk

References

B Carstensen: Regression models for interval censoredsurvival data: application to HIV infection in Danish homosexualmen. Statistics in Medicine, 15(20):2177-2189, 1996.

CP Farrington: Interval censored survival data: a generalized linearmodelling approach. Statistics in Medicine, 15(3):283-292, 1996.

Examples

  data( HIV.dk )

Calculate floated variances

Description

Given a fitted model object, thefloat() function calculatesfloating variances (a.k.a. quasi-variances) for a given factor in the model.

Usage

float(object, factor, iter.max=50, ...)

Arguments

object

a fitted model object.

factor

character string giving the name of the factor ofinterest. If this is not given, the first factor in the model is used.

iter.max

Maximum number of iterations for EM algorithm.

...

Optional arguments passed to thevcov() method forthe fitted model object.

Details

Thefloat() function implements the "floating absolute risk"proposal of Easton, Peto and Babiker (1992). This is an alternative wayof presenting parameter estimates for factors in regression models,which avoids some of the difficulties of treatment contrasts. It wasoriginally designed for epidemiological studies of relative risk, butthe idea is widely applicable.

Treatment contrasts are not orthogonal. Consequently, the variances oftreatment contrast estimates may be inflated by a poor choice ofreference level, and the correlations between them may also be high.Thefloat() function associates each level of the factor with a"floating" variance (or quasi-variance), including the referencelevel. Floating variances are not real variances, but they can beused to calculate the variance error of contrast by treating eachlevel as independent.

Plummer (2003) showed that floating variances can be derived from acovariance structure model applied to the variance-covariance matrixof the contrast estimates. This model can be fitted by minimizing theKullback-Leibler information divergence between the true distributionof the parameter estimates and the simplified distribution given bythe covariance structure model. Fitting is done using the EMalgorithm.

In order to check the goodness-of-fit of the floating variance model,thefloat() function compares the standard errors predicted bythe model with the standard errors derived from the truevariance-covariance matrix of the parameter contrasts. The maximum andminimum ratios between true and model-based standard errors arecalculated over all possible contrasts. These should be within 5percent, or the use of the floating variances may lead to invalidconfidence intervals.

Value

An object of classfloated. This is a list with the followingcomponents

coef

A vector of coefficients. These are the same as thetreatment contrasts but the reference level is present withcoefficient 0.

var

A vector of floating (or quasi-) variances

limits

The bounds on the accuracy of standard errors over allpossible contrasts

Note

Menezes(1999) and Firth and Menezes (2004) take a slightly differentapproach to this problem, using a pseudo-likelihood approach to fitthe quasi-variance model. Their work is implemented in the packageqvcalc.

Author(s)

Martyn Plummer

References

Easton DF, Peto J and Babiker GAG (1991) Floating absolute risk: Analternative to relative risk in survival and case control analysisavoiding an arbitrary reference group.Statistics in Medicine,10, 1025-1035.

Firth D and Mezezes RX (2004) Quasi-variances.Biometrika91, 65-80.

Menezes RX(1999) More useful standard errors for group and factoreffects in generalized linear models.D.Phil. Thesis,Department of Statistics, University of Oxford.

Plummer M (2003) Improved estimates of floating absolute risk,Statistics in Medicine,23, 93-104.

Create a data structures suitable for use with packages`mstate` or`etm`.

Description

Themstate package requires input in the form of a stackeddataset with specific variable names. This is provided bymsdata.Lexis. The resulting dataframe contains the sameinformation as the result of a call tostack.Lexis.

Theetm package requires input (almost) in the form of aLexis object, but with specific column names etc. This isprovided byetm.Lexis.

Usage

msdata(obj, ...)## S3 method for class 'Lexis'msdata(obj,                time.scale = timeScales(obj)[1],                       ... )## S3 method for class 'Lexis'etm( data,               time.scale = timeScales(data)[1],                cens.name = "cens",                        s = 0,                        t = "last",               covariance = TRUE,                 delta.na = TRUE,                      ... )

Arguments

obj

ALexis object.

data

ALexis object.

time.scale

Name or number of timescale in theLexisobject.

cens.name

Name of the code for censoring used byetm. Itis only necessary to change this if one of the states in theLexis object has name "cens".

s

Passed on toetm.

t

Passed on toetm.

covariance

Passed on toetm.

delta.na

Passed on toetm.

...

Further arguments.

Value

msdata.Lexis returns a dataframe with theLexis specificvariables stripped, and with the following added:id,Tstart,Tstop,from,to,trans,status, which are used in themstate package.

etm.Lexis transforms theLexis object into a dataframesuitable for analysis by the functionetm from theetmpackage, and actually calls this function, so returns an object ofclassetm.

Author(s)

Bendix Carstensen,b@bxc.dk,http://bendixcarstensen.com

Examples

data(DMlate)str(DMlate)dml <- Lexis( entry = list(Per=dodm,Age=dodm-dobth,DMdur=0),               exit = list(Per=dox),        exit.status = factor(!is.na(dodth),labels=c("DM","Dead")),               data = DMlate[1:1000,] )dmi <- cutLexis( dml, cut=dml$doins, new.state="Ins", pre="DM" )summary( dmi )# Use the interface to the mstate packageif( require(mstate) ){ms.dmi <- msdata.Lexis( dmi )# Check that all the transitions and person-years got across.with( ms.dmi, rbind( table(status,trans),                     tapply(Tstop-Tstart,trans,sum) ) )}# Use the etm package directly with a Lexis objectif( require(etm) ){dmi <- subset(dmi,lex.id<1000)etm.D <- etm.Lexis( dmi, time.scale=3 )str( etm.D )plot( etm.D, col=rainbow(5), lwd=2, lty=1, xlab="DM duration" )}

Fit a floating trend to a factor in generalized linear model

Description

Fits a "floating trend" model to the given factor in a glm in a generalizedlinear model by centering covariates.

Usage

ftrend(object, ...)

Arguments

object

fittedlm orglm object. The model must not have an intercept term

...

arguments to thenlm function

Details

ftrend() calculates "floating trend" estimates for factors ingeneralized linear models. This is an alternative to treatmentcontrasts suggested by Greenland et al. (1999). If a regression modelis fitted with no intercept term, then contrasts are not used for thefirst factor in the model. Instead, there is one parameter for eachlevel of this factor. However, the interpretation of theseparameters, and their variance-covariance matrix, depends on thenumerical coding used for the covariates. If an arbitrary constant isadded to the covariate values, then the variance matrix is changed.

Theftrend() function takes the fitted model and works out an optimalconstant to add to the covariate values so that the covariance matrix isapproximately diagonal. The parameter estimates can then be treated asapproximately independent, thus simplifying their presentation. This isparticularly useful for graphical display of dose-response relationships(hence the name).

Greenland et al. (1999) originally suggested centring the covariates so thattheir weighted mean, using the fitted weights from the model, is zero. Thisheuristic criterion is improved upon byftrend() which uses the sameminimum information divergence criterion as used by Plummer (2003) forfloating variance calculations.ftrend() callsnlm() todo the minimization and will pass optional arguments to control it.

Value

A list with the following components

coef

coefficients for model with adjusted covariates.

vcov

Variance-covariance matrix of adjusted coefficients.

Note

The "floating trend" method is an alternative to the "floatingabsolute risk" method, which is implemented in the functionfloat().

Author(s)

Martyn Plummer

References

Greenland S, Michels KB, Robins JM, Poole C and Willet WC (1999)Presenting statistical uncertainty in trends and dose-response relations,American Journal of Epidemiology,149, 1077-1086.

Generate covariates for drug-exposure follow-up from drug purchase records.

Description

From records of drug purchase and possibly known treatment intensity,the time since first drug use and cumulative dose at prespecified timesis computed. Optionally, lagged exposures are computed too,i.e. cumulative exposure a prespecified time ago.

Usage

gen.exp( purchase,, dop="dop", amt="amt", dpt="dpt",               fu, doe="doe", dox="dox",           breaks,          use.dpt = ( dpt %in% names(purchase) ),         push.max = Inf,          rm.dose = FALSE,             lags = NULL,          lag.dec = 1,          lag.pre = "lag.",         pred.win = Inf )

Arguments

purchase

Data frame with columnsid-person id,dop -dateofpurchase,amt -amount purchased, and optionallydpt -(dosepertime) ("defined daily dose", DDD,that is), how much is assumed to be ingested per unit time. Theunits used fordpt is assumed to be units ofamt perunits ofdop.

id

Character. Name of the id variable in the data frame.

dop

Character. Name of thedateofpurchase variable in the data frame.

amt

Character. Name of theamount purchasedvariable in the data frame.

dpt

Character. Name of thedose-per-timevariable in the data frame.

fu

Data frame withfollow-up period for eachperson, the person id variable must have the same name as in thepurchase data frame.

doe

Character. Name of thedateofentryvariable.

dox

Character. Name of thedateof exitvariable.

breaks

Numerical vector of dates at which the time since firstexposure, cumulative dose etc. are computed.

use.dpt

Logical: should we use information on dose per time.

push.max

Numerical. How much can purchases maximally be pushedforward in time. See details.

rm.dose

Logical. Should the dose from omitted period ofexposure (due to the setting ofpush.max) be ignored. IfFALSE, the cumulative dose will be the cumulation of theactually purchased amounts, regardless of how far the inceptiondates have been pushed.

lags

Numerical vector of lag-times used in computing laggedcumulative doses.

lag.dec

How many decimals to use in the construction of namesfor the lagged exposure variables

lag.pre

Character string used for prefixing names of laggedexposure variables. Aimed to facilitate the use ofgen.expfor different drugs with the aim of merging information.

pred.win

The length of the window used for constructing theaverage dose per time used to compute the duration of the lastpurchase. Only used whenuse.dpt=FALSE. The default valueInf corresponds to using the time between first and lastpurchase of drug as the interval for computing average consumptionper time, and thus the termination of use.

Details

The intention of this function is to generate covariates for aparticular drug for the entire follow-up of each person. The reasonthat the follow-up prior to first drug purchase and post-exposure isincluded is that the covariates must be defined for all follow-up foreach person in order to be useful for analysis of disease outcomes.

The functionality is described in terms of calendar time as underlyingtime scale, because this will normally be the time scale for drugpurchases and for entry and exit for persons. In principle thevariables termed as dates might equally well refer to say the agescale, but this would then have to be trueboth for thepurchase data, the follow-up data and thebreaks argument.

Drug purchase records (inpurchase) are used to constructmeasures of drug exposure at prespecified timepoints (inbreaks) in follow-up intervals (infu). Each person mayhave more than one follow-up interval. They should be disjoint, butthis is not checked.

Ifuse.dpt isTRUE then the dose per time information isused to compute the exposure interval associated with each purchase.Exposure intervals are stacked, that is each interval is put after anyprevious. This means that the start of exposure to a given purchasecan be pushed into the future. The parameterpush.max indicatesthe maximally tolerated push. If this is reached by a person, theassumption is that some of the purchased drug may not be counted inthe exposure calculations — seerm.dose.

Thedpt can either be a constant, basically translating eachpurchased amount into exposure time the same way for all persons, orit can be a vector with different treatment intensities for eachpurchase. In any case the cumulative dose is computed takingdpt into account, unlessrm.dose isFALSE inwhich case the actual purchased amount is cumulated. The latter isslightly counter-intuitive because we are using thedpt to pushthe intervals, and then disregard it when computing the cumulativedose. The counter argument is that if the limitpush.max isreached, the actual dosage may be larger than indicated thedpt, and is essentially what this allows for.

Ifuse.dpt isFALSE then the exposure from one purchaseis assumed to stretch over the time to the next purchase, so we areeffectively allowing different dosing rates (dose per time) betweenpurchases. Formally this approach conditions on the future, becausethe rate of consumption (the accumulation of cumulative exposure) iscomputed based on knowledge of when next purchase is made. Moreover,with this approach, periods of non-exposure does not exist, exceptafter the last purchase where the future consumption rate is taken tobe the average over the period of use (or a period of lengthpred.win), and hence defines a date of cessation of drug.

Finally, ifuse.dpt isFALSE, at least two purchaserecords are required to compute the measures. Therefore persons withonly one drug purchase record are ignored in calculations.

Value

A data frame with one record per person and follow-up date(breaks). Date of entry and date of exit are included too; butonly follow-up in the intersetion ofrange(breaks) andrange(fu$doe,fu$dox) is output.

id: person id.
dof: date of follow up, i.e. start of interval. Apartfrom possibly the first interval for each person, this will assumevalues in the set of the values inbreaks. All other variablesrefer to status as of this date.
dur: the length (duration) of interval.
tfi: timefrom firstinitiation of drug.
off: Logical, indicating whether the person isoff drug. So it isFALSE if the person is exposed atdof.
doff: date of latest transition tooffdrug. Note that tis defined also at dates after drug exposure has beenresumed.
tfc: timefrom latestcessation of drug.
ctim: cumulativetime on the drug.
cdos: cumulativedose.
ldos: suffixed with one value per element inlags, the latter giving the cumulative doseslags beforedof.

Author(s)

Bendix Carstensen,b@bxc.dk. The development ofthis function was supported partly through a grant from the EFSD(European Foundation for the Study of Diabetes)

Examples

# Example data for drug purchases in 3 persons --- dates (dop) are# measured in years, amount purchased (amt) in no. pills and dose per# time (dpt) consequently given in units of pills/year. Note we also# include a person (id=4) with one purchase record only.n <- c( 10, 18, 8, 1 )hole <- rep(0,n[2])hole[10] <- 2 # to create a hole of 2 years in purchase dates# dates of drug purchasedop <- c( 1995.278+cumsum(sample(1:4/10,n[1],replace=TRUE)),          1992.351+cumsum(sample(1:4/10,n[2],replace=TRUE)+hole),          1997.320+cumsum(sample(1:4/10,n[3],replace=TRUE)),          1996.470 )# purchased amounts mesured in no. pillsamt <- sample( 1:3*50 , sum(n), replace=TRUE )# prescribed dosage therefore necessarily as pills per year dpt <- sample( 4:1*365, sum(n), replace=TRUE )# collect to purchase data framedfr <- data.frame( id = rep(1:4,n),                  dop,                  amt = amt,                  dpt = dpt )head( dfr, 3 )# a simple dataframe for follow-up periods for these 4 personsfu <- data.frame( id = 1:4,                 doe = c(1995,1992,1996,1997)+1:4/4,                 dox = c(2001,2003,2002,2010)+1:4/5 )fu# Note that the following use of gen.exp relies on the fact that the# purchase dataframe dfr has variable names "id", "dop", "amt" and# "dpt"" and the follow-up data frame fu has variable names "id",# "doe" and "dox"# 1: using the dosage informationdposx <- gen.exp( dfr,                   fu = fu,              use.dpt = TRUE,               breaks = seq(1990,2015,0.5),                 lags = 2:4/4,              lag.pre = "l_" )format( dposx, digits=5 )# 2: ignoring the dosage information,#    hence person 4 with only one purchase is omittedxposx <- gen.exp( dfr,                   fu = fu,              use.dpt = FALSE,               breaks = seq(1990,2015,0.5),                 lags = 2:3/5 )format( xposx, digits=5 )# It is possible to have disjoint follow-up periods for the same person:fu <- fu[c(1,2,2,3),]fu$dox[2] <- 1996.2fu$doe[3] <- 1998.3fu# Note that drug purchase information for the period not at risk *is* useddposx <- gen.exp( dfr,                   fu = fu,              use.dpt = TRUE,               breaks = seq(1990,2015,0.1),                 lags = 2:4/4 )format( dposx, digits=5 )

Population mortality rates for Denmark in 5-years age groups.

Description

ThegmortDK data frame has 418 rows and 21 columns.

Format

This data frame contains the following columns:

`agr`:	Age group, 0:0--4, 5:5--9,..., 90:90+.
`per`:	Calendar period, 38: 1938--42, 43: 1943--47, ..., 88:1988-92.
`sex`:	Sex, 1: male, 2: female.
`risk`:	Number of person-years in the Danish population.
`dt`:	Number of deaths.
`rt`:	Overall mortality rate in cases per 1000 person-years, i.e.`rt=1000*dt/risk`
	Cause-specific mortality rates in cases per 1000 person-years:
`r1`:	Infections
`r2`:	Cancer.
`r3`:	Tumors, benign, unspecific nature.
`r4`:	Endocrine, metabolic.
`r5`:	Blood.
`r6`:	Nervous system, psychiatric.
`r7`:	Cerebrovascular.
`r8`:	Cardiac.
`r9`:	Respiratory diseases, excl. cancer.
`r10`:	Liver, excl. cancer.
`r11`:	Digestive, other.
`r12`:	Genitourinary.
`r13`:	Ill-defined symptoms.
`r14`:	All other, natural.
`r15`:	Violent.

Source

Statistics Denmark, National board of health provided original data. Michael Andersson grouped the causes of death.

Examples

data(gmortDK)

Create a basis of harmonic functions.

Description

Returns a matrix of harmonic functions usable for modelingperiodic effects

Usage

harm(x, ord=1, per=1, verbose=FALSE )

Arguments

x

A numeric variable.

ord

Integer, the order of the harmonic.

per

Numeric, the length of the period on thex scale.

verbose

Logical: shall I tell what I do with dates?

Details

Columns are constructed under the assumption that the periodic functionhas periodper on thex scale. Thus, the first columnsis defined assin(2*pi*x/per),cos(2*pi*x/per),sin(4*pi*x/per) etc.

Sincesin andcos are periodic functions there is norequirement thatx be in any particular range.

Value

A matrix withnrow(x) rows and2*deg columns and columnnamessin1,cos1,sin2,cos2 etc.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

x <- seq(-1,1,0.01)head( harm(x,ord=2) )matplot( x, harm(x,ord=2), type="l", lty=1, lwd=3 )

Is`x` in the column span of matrix`A` and what columns arelinearly dependent?

Description

The functionin.span checks if the vectorx (orcolumns of the matrixx) is in the column span of the matrixA. If desired, it returns the coefficient matrixB so thatAB=x. The functionthinCol removes linearly dependentcolumns an returns a matrix of full rank.

Usage

in.span( A,         x,      coef = FALSE,       tol = 1e-08 )inSpan( A, x, coef=FALSE, tol=1e-08 )id.span( A, B, tol=1e-08 )idSpan( A, B, tol=1e-08 )thinCol( A, tol = 1e-06, col.num = FALSE )

Arguments

A

A matrix.

B

A matrix.

x

A vector or matrix.length(x) (ornrow(x)) mustbe equal tonrow(A).

coef

Logical. Should the coefficient matrix (k) bereturned, so thatAk=x?

tol

Tolerance for identity of matrices in check(in.span) or QR decomposition (thinCol)

col.num

Logical. Should the positions of dependent columns bereturned instead of the full-rank matrix?

Details

thinCol is mainly a workhorse indetrend, but made available because of its generalusefulness.

in.span andinSpan are just different names for the sameto accommodate different naming schools.

in.span (inSpan) is handy in checking whether differentparametrizations of a model are identical in the sense of spanning thesame linear space. Equivalent to checking whether fitted values underdifferent parametrizations are identical, but has the further use ofchecking if subspaces of models are equivalent. The functionsimply checks if the regression of (columns of)x on thecolumns ofA produces residuals that are all 0.

id.span (equivalent toidSpan) checks whether twomatrices have the same column span.

Value

in.span returns a logical: isx is inspan(A)? Ifcoef=TRUE it returns a matrixk sothatAk=x.k is not necessarily unique (A may not havefull rank).

id.span returns a logical: isspan(A) the same asspan(B)?

thinCol returns a matrix of full rank, formed fromA bydeleting columns linearly dependent on other. Ifcol.num=TRUE(one possible set of) positions of columns forming a full rank basisfor the column space ofA is returned.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com with essentialhelp from Lars Jorge Diaz and Peter Dalgaard.

Examples

# Matrices and vectors, x in span(A), z (hopefully) notA <- matrix(round(rnorm(15)*20),5,3)B <- matrix(round(rnorm(15)*20),5,3)B <- cbind( B, B%*%c(3,4,2) )x <- A %*% c(3,4,2)z <- 5:9# how they lookdata.frame( A=A, x=x, z=z, B=B )# vectors in span(A)?in.span(A,x)in.span(x,A)in.span(A,x,coef=TRUE)in.span(A,z)in.span(A,z,coef=TRUE)# Do matrices span the same space ?in.span( A, B )in.span( B, A )# B is not in span of a subspace of B columns, but vice versa( M <- matrix( rnorm(8)*7, 4, 2 ) )in.span( B%*%M, B )in.span( B, B%*%M )id.span( B, B%*%M )# But not unique for singular matrices:( xx <- in.span( B, B%*%M, coef=TRUE ) )cbind( B%*%M, B%*%xx )cbind( xx, M )# Easier for full rank matrices:( K <- matrix( rnorm(9)*7, 3, 3 ) )in.span( A%*%K, A )in.span( A, A%*%K )id.span( A, A%*%K )in.span( A, A%*%K, coef=TRUE )

Draw a box with text explaining the numbers in and between boxesfrom`boxes.MS` and`boxes.Lexis`

Description

When drawing boxes describing a multistate model a legendexplaining the numbers in the plot is required.legendbox doesthis.

Usage

legendbox(x, y,         state = "State",            py = "Person-time",         begin = "no. begin",           end = "no. end",         trans = "Transitions",         rates = "\n(Rate)",          font = 1,         right = !left,          left = !right,           ...)

Arguments

x

x-coordinate of the center of the box.

y

y-coordinate of the center of the box.

state

Text describing the state

py

Text describing the risk time

begin

Text describing the no. persons starting FU in state

end

Text describing the no. persons ending FU in state

trans

Text describing the no. of transitions

rates

Text describing the rates

font

Font to use for the text

right

Should a text describing arrow texts be on the r.h.s. of the box?Defaults to TRUE.

left

Should a text describing arrow texts be on the l.h.s. of the box?

...

Arguments passed on totbox

Details

The function is called for its side effect of adding anexplanatory box to the plot. Ifright is true, an explanationof events and rates are added to the right of the box. Similarly forleft. It is admissible thatleft == right.

Value

None.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

An unmatched case-control study of leprosy incidence

Description

Thelep data frame has 1370 rows and 7 columns. This was anunmatched case-control study in which incident cases of leprosy in aregion of N. Malawi were compared with population controls.

Format

This data frame contains the following columns:

`id`:	subject identifier: a numeric vector
`d`:	case/control status: a numeric vector (1=case, 0=control)
`age`:	a factor with levels`5-910-1415-1920-2425-2930-4445+`
`sex`:	a factor with levels`male`,`female`
`bcg`:	presence of vaccine scar, a factor with levels`noyes`
`school`:	schooling, a factor with levels`none1-5yrs6-8yrssec/tert`
`house`:	housing, a factor with levels`bricksunbrickwattletemp`

Source

The study is described in more detail in Clayton and Hills, StatisticalModels in Epidemiology, Oxford University Press, Oxford:1993.

Examples

data(lep)

Convenience versions of grep

Description

Often you want the elements of a vector (or its names or levels) thatmeet a certain pattern. Butgrep only gives you the position, sothese functions are designed to give you that.

Usage

fgrep( pattern, x, ... )ngrep( pattern, x, ... )lgrep( pattern, x, ... )

Arguments

pattern

Pattern searched for.

x

Object wherepattern is searched. Or in whosenamesorlevels attributespattern is sought.

...

Arguments passed on togrep.

Value

Elements of the inputx (fgrep) or its namesattribute (ngrep) or levels attribute (lgrep).

Author(s)

Bendix Carstensen,b@bxc.dk,http://bendixcarstensen.com

Examples

ff <- factor( ll <- paste( sample( letters[1:3], 20, replace=TRUE ),                           sample( letters[1:3], 20, replace=TRUE ), sep="" ) )fffgrep( "a", ff )fgrep( "a", ll )ngrep( "a", ff )lgrep( "a", ff )lgrep( "a", ff, invert=TRUE )

Functions to manage and explore the workspace

Description

These functions help you to find out what has gone wrong and tostart afresh if needed.

Usage

lls(pos = 1, pat = "", all=FALSE, print=TRUE )clear()

Arguments

pos

Numeric. What position in the search path do you want listed.

pat

Character. List only objects that have this string in their name.

all

Logical. Should invisible objects be printed too -seels to which this argument is passed.

print

Logical. Should the result be printed?

Details

lls is designed to give a quick overview of the name, mode, classand dimension of the object in your workspace. They may not always be what youthink they are.

clear clears all your objects from workspace, and all attached objectstoo — it only leaves the loaded packages in the search path; thus allowing afresh start without closing and restarting R.

Value

lls returns a data frame with four character variables:name,mode,class andsize and one row per object in the workspace (ifpos=1).size is either the length or the dimension of the object.The data frame is by default printed with left-justified columns.

Author(s)

lls: Unknown. Modified by Bendix Carstensen from a longforgotten snatch.

clear: Michael Hills / David Clayton.

Examples

x <- 1:10y <- rbinom(10, 1, 0.5)m1 <- glm( y ~ x, family=binomial )M <- matrix( 1:20, 4, 5 ).M <- Mdfr <- data.frame(x,y)attach( dfr )lls()search()clear()search()lls()lls(all=TRUE)

Male lung cancer incidence in Denmark

Description

Male lung cancer cases and population riks time in Denmark, for theperiod 1943–1992 in ages 40–89.

Usage

data(lungDK)

Format

A data frame with 220 observations on the following 9 variables.

`A5`:	Left end point of the age interval, a numeric vector.
`P5`:	Left enpoint of the period interval, a numeric vector.
`C5`:	Left enpoint of the birth cohort interval, a numeric vector.
`up`:	Indicator of upper trianges of each age by period rectangle in the Lexis diagram. (`up=(P5-A5-C5)/5`).
`Ax`:	The mean age of diagnois (at risk) in the triangle.
`Px`:	The mean date of diagnosis (at risk) in the triangle.
`Cx`:	The mean date of birth in the triangle, a numeric vector.
`D`:	Number of diagnosed cases of male lung cancer.
`Y`:	Risk time in the male population, person-years.

Details

Cases and person-years are tabulated by age and date ofdiagnosis (period) as well as date of birth (cohort) in 5-yearclasses. Each observation in the dataframe correponds to a triangle ina Lexis diagram. Triangles are classified by age and date ofdiagnosis, period of diagnosis and date of birth, all in 5-yeargroupings.

Source

The Danish Cancer Registry and Statistics Denmark.

References

For a more thorough exposition of statistical inference in the Lexisdiagram, see: B. Carstensen: Age-Period-Cohort models for the Lexisdiagram. Statistics in Medicine, 26: 3018-3045, 2007.

Examples

data( lungDK )# Draw a Lexis diagram and show the number of cases in it.attach( lungDK )Lexis.diagram( age=c(40,90), date=c(1943,1993), coh.grid=TRUE )text( Px, Ax, paste( D ), cex=0.7 )

Plot columns of a matrix as stacked areas.

Description

matrix topolygon: Plot columns of a matrixas stacked areas.

Usage

mat2pol( pm,       perm = 1:ncol(pm),          x = as.numeric(rownames(pm)),        col = rainbow(ncol(pm)),         yl = 0:1,     append = FALSE,        ... )

Arguments

pm

Numerical matrix.

perm

integer vector of lengthncol(pm), used to permutethe columns ofpm.

x

Numeric. The x-axis of the plot.

col

Colors of the areas.

yl

y-axis limits.

append

Logical. Should the polygons be added to an exiating plot

...

Further parameters passed toplot.

Details

The function is originally intended to plot stackedprobabilities, hence the default of0:1 for the y-axis.

Value

A matrix ofncol(pm)+1 columns with the first equal to 0,and the remaining the cumulative sum of the columns ofpm[perm].

The function is called for its side effect - the stacked polygons.

Author(s)

Bendix Carstensen

Examples

M <- cbind( sort(runif(10)), sort(runif(10)), sort(runif(10)) )pm <- sweep( M, 1, apply(M,1,sum), "/" )mat2pol( pm )

Plot confidence intervals as shaded areas around lines.

Description

Uses an x-vector and a matrix of 3*N columns with estimates and ci.sto produce the lines of estimates and confidence intervals as shadedareas in transparent colours around the lines of the estimates.

Usage

matshade( x, y, lty = 1,          col = 1:(ncol(y)/3), col.shade=col, alpha=0.15,         plot = dev.cur()==1,          ... )

Arguments

x

Numerical vector. Unlikematplot this can only be a vector.

y

A matrix with 3*N columns — representing estimates andconfidence bounds for N curves. Order of columns are assumed to be(est,lo,hi,est,lo,hi...) (or (est,hi,lo...))

lty

Line types for the curves.

col

Color(s) of the estimated curves.

col.shade

Color(s) of the shaded areas. These are the colorsthat are made transparent by thealpha factor. Defaults tothe same colors as the lines.

alpha

Number in [0,1] indicating the transparency of the colors forthe confidence intervals. Larger values makes the shadesdarker. Can be a vector which then applies to the curves in turn.

plot

Logical. Should a new plot frame be started? If no deviceis active, the default is to start one, and plot allys versusx in transparent color. On the rare occasion a device is open, butno plot have been called you will get an error telling that plot.newhas not been called yet, in which case you should explicitly setplot toTRUE.

...

Arguments passed on tomatplot (ifplot=TRUE) andmatlines for use when plottingthe lines. Note thatlwd=0 will cause lines to be omitted andonly the shades be plotted.

Details

All shaded areas are plotted first, the curves addedafterwards, so that lines are not 'overshadowed'.

If there are NAs inx ory there will be separate shadedareas for each non-NA sequence. Applies separately to each setof confidence bands iny.

Note that if you repeat the same command, you will get the curvesand the shaded areas overplotted in the same frame, so the effect is tohave the shades darker, because the transparent colors are plotted ontop of those from the first command.

Value

NULL. Used for its side effects.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

# Follow-up data of Danish DM patientsdata( DMlate )mL <- Lexis( entry=list(age=dodm-dobth,per=dodm),              exit=list(per=dox),       exit.status=factor(!is.na(dodth),labels=c("Alive","Dead")),              data=DMlate )# Split follow-up and model by splinessL <- splitLexis( mL, breaks=0:100, time.scale="age")## Not run: # the same thing with popEpisL <- splitMulti( mL, age=0:100 )            ## End(Not run)# Mortality rates separately for M and F:mort <- glm( (lex.Xst=="Dead") ~ sex*Ns(age,knots=c(15,3:8*10)),             offset = log(lex.dur),             family = poisson,               data = sL )## Not run: # The counterpart with gamlibrary( mgcv )mort <- gam( (lex.Xst=="Dead") ~ s(age,by=sex) + sex,             offset = log(lex.dur),             family = poisson,               data = sL )       ## End(Not run)# predict rates (per 1000 PY) for men and womenndM <- data.frame( age=10:90, sex="M", lex.dur=1 )ndF <- data.frame( age=10:90, sex="F", lex.dur=1 )# gam objects ignores the offset in prediction so# lex.dur=1000 in prediction frame wll not work.prM <- ci.pred( mort, ndM )*1000prF <- ci.pred( mort, ndF )*1000# predict rate-ratioMFr <- ci.exp( mort, ctr.mat=list(ndM,ndF) )# plot lines with shaded confidence limits# for illustration we make a holes for the RRs:MFr[40:45,2] <- NAMFr[44:49,1] <- NAmatshade( ndM$age, cbind( MFr, prF, prM ), col=c(1,2,4), lwd=3,          log="y", xlab="Age", ylab="Mortality per 1000 PY (and RR)" )abline( h=1 )

Cut follow-up at multiple event dates and keep track of order of events

Description

A generalization ofcutLexis to the case where differentevents may occur in any order (but at most once for each).

Usage

mcutLexis( L0, timescale = 1, wh,              new.states = NULL,        precursor.states = transient(L0),              seq.states = TRUE,              new.scales = NULL,            ties.resolve = FALSE )

Arguments

L0

A Lexis object.

timescale

Which time scale do the variables inL0[,wh]refer to. Can be character or integer.

wh

Which variables contain the event dates. Character orinteger vector

new.states

Names of the events forming new states. IfNULL equal to the variable names fromwh.

precursor.states

Which states are precursor states. SeecutLexis for definition of precursor states.

seq.states

Should the sequence of events be kept track of? Thatis, should A-B be considered different from B-A. IfFALSE,the state with combined preceding events A and B will be calledA+B (alphabetically sorted).

May also be supplied as character:s - sequence, keeptrack of sequence of states occupied (same asTRUE),u- unordered, keep track only of states visited (same asFALSE) orl,c - last or current state, onlyrecord the latest state visited. If given as character, only thefirst letter converted to lower case is used.

new.scales

Should we construct new time scales indicating thetime since each of the event occurrences.

ties.resolve

Should tied event times be resolved by addingrandom noise to tied event dates. IfFALSE the function willnot accept that two events occur at the same time for a person(ties). IfTRUE a random quantity in the rangec(-1,1)/100 will be added to all event times in all recordswith at least one tie. Ifties.resolve is numeric a randomquantity in the rangec(-1,1)*ties.resolve will be added toall event times in all records with at least one tie.

Value

ALexis object with extra states created byoccurrence of a number of intermediate events.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

# A dataframe of timesset.seed(563248)dd <- data.frame( id = 1:30,                 doN = round(runif(30,-30, 0),1),                 doE = round(runif(30,  0,20),1),                 doX = round(runif(30, 50,60),1),                 doD = round(runif(30, 50,60),1),                 # these are the event times                 doA = c(NA,21,NA,27,35,NA,52, 5,43,80,                         NA,22,56,28,53,NA,51, 5,43,80,                         NA,23,NA,33,51,NA,55, 5,43,80),                 doB = c(NA,20,NA,53,27,NA, 5,52,34,83,                         NA,20,23,37,35,NA,52, 8,33,NA,                         25,NA,37,40,NA,NA,15,23,36,61) )# set up a Lexis object with time from entry to death/exitLx <- Lexis( entry = list(time=doE,                           age=doE-doN),              exit = list(time=pmin(doX,doD)),       exit.status = factor(doD<doX,labels=c("OK","D")),              data = dd )summary( Lx )# cut the follow-up at dates doA and doBL2 <- mcutLexis( Lx, "time", wh=c("doA","doB"),                 new.states = c("A","B"),           precursor.states = "OK",                 seq.states = TRUE,                 new.scales = c("tfA","tfB") )summary( L2 )L2# show the statesboxes( L2, boxpos=list(x=c(10,60,50,90,50,90),                       y=c(50,50,90,90,10,10)),           scale.R=100, show.BE=TRUE, DR.sep=c(" (",")"))L3 <- mcutLexis( Lx, "time", wh=c("doA","doB"),                 new.states = c("A","B"),           precursor.states = "OK",                 seq.states = FALSE,                 new.scales = c("tfA","tfB") )summary( L3 )boxes( L3, boxpos=list(x=c(10,50,50,90,50),                       y=c(50,50,90,50,10)),           show.R=FALSE, show.BE=TRUE )

Merge a Lexis object with a data frame

Description

Merge additional variables from a data frame into a Lexis object.

Usage

## S3 method for class 'Lexis'merge(x, y, id, by, ...)

Arguments

x

an object of classLexis

y

a data frame

id

the name of the variable iny to use for matchingagainst the variablelex.id inx.

by

if matching is not done by id, a vector of variable namescommon to bothx andy

...

optional arguments to be passed tomerge.data.frame

Details

ALexis object can be considered as an augmented data framein which some variables are time-dependent variables representingfollow-up. TheLexis function produces a minimal objectcontaining only these time-dependent variables. Additional variablesmay be added to aLexis object using themerge method.

Value

ALexis object with additional columns taken from themerged data frame.

Note

The variable given as theby.y argument must not containany duplicate values in the data framey.

Author(s)

Martyn Plummer

Mantel-Haenszel analyses of cohort and case-control studies

Description

This function carries out Mantel-Haenszel comparisons in tabulated data derived from both cohort and case-control studies.

Usage

mh(cases, denom, compare=1, levels=c(1, 2), by=NULL,     cohort=!is.integer(denom), confidence=0.9)## S3 method for class 'mh'print(x, ...)

Arguments

cases

the table of case frequencies (a multiway array).

denom

the denominator table. For cohort studies this should be a table of person-years observation, while for case-control studies it should be a table of control frequencies.

compare

the dimension of the table which defines the comparison groups (can be referred to either by number or by name). The default is the first dimension of the table.

levels

a vector identifying (either by number or by name) the two groups to be compared. The default is the first two levels of the selected dimension.

by

the dimensions not to be collapsed in the Mantel-Haenszel computations. Thus, this argument defines the structure of the resulting tables ofestimates and tests.

cohort

an indicator whether the data derive from a cohort or a case-control study. If the denominator table is stored as an integer, a case-controlstudy is assumed.

confidence

the approximate coverage probability for the confidence intervals to be computed.

x

amh object

...

arguments passed on toprint

Details

Multiway tables of dataare accepted and any two levels of any dimension can be chosen as definingthe comparison groups. The rate (odds) ratio estimates and the associatedsignificance tests may be collapsed over all the remaining dimensions of the table, or over selected dimensions only, so that tables of estimates and tests are computed.

Value

A list of classmh giving tables of rate (odds) ratio estimates,their standard errors (on a log scale), lower and upper confidencelimits, chi-squared tests (1 degree of freedom) and the correspondingp-values. The result list also includes numerator and denominator of theMantel-Haenszel estimates (q, r), and score test statistics and scorevariance (u, v).

Side Effects

None

References

Clayton, D. and Hills, M. : Statistical Models in Epidemiology, Oxford University Press (1993).

Examples

# If d and y are 3-way tables of cases and person-years # observation formed by tabulation by two confounders # (named "C1" and "C2") an exposure of interest ("E"), # the following command will calculate an overall # Mantel-Haenszel comparison of the first two exposure # groups.## Generate some bogus datadnam <- list( E=c("low","medium","high"), C1=letters[1:2], C2=LETTERS[1:4] )d <- array( sample( 2:80, 24),            dimnames=dnam, dim=sapply( dnam, length ) )y <- array( abs( rnorm( 24, 227, 50 ) ),            dimnames=dnam, dim=sapply( dnam, length ) )mh(d, y, compare="E")## Or, if exposure levels named "low" and "high" are to be # compared and these are not the first two levels of E :#mh(d, y, compare="E", levels=c("low", "high"))## If we wish to carry out an analysis which controls for C1, # but examines the results at each level of C2:#mh(d, y, compare="E", by="C2")## It is also possible to look at rate ratios for every # combination of C1 and C2 :#mh(d, y, compare="E", by=c("C1", "C2"))## If dimensions and levels of the table are unnamed, they must # be referred to by number.#

Fit intensity models to follow-up data in Lexis objects

Description

Modeling intensities based on Lexis objects, exploiting the structure of theLexis objects where the events and risk time have predefinedrepresentations. This allows a simpler syntax than thetraditional explicit modeling usingglm,gamandcoxph. Requires thatlex.Cst andlex.Xstare defined as factors.

But it is just a set of wrappers forglm,gam andcoxph.

Usage

glmLexis(Lx, formula,         from = preceding(Lx, to), to = absorbing(Lx),         paired = FALSE, link = "log", scale = 1, verbose = TRUE, ... )     gamLexis(Lx, formula,         from = preceding(Lx, to), to = absorbing(Lx),         paired = FALSE, link = "log", scale = 1, verbose = TRUE, ... )     coxphLexis(Lx, formula,           from = preceding(Lx, to), to = absorbing(Lx),           paired = FALSE, verbose = TRUE, ... )       glm.Lexis( Lx,         # Lexis object        formula,         # ~ model           from = preceding(Lx, to), # 'from' states             to = absorbing(Lx)    , # 'to' states         paired = FALSE, # only the pairwise           link = "log", # link function          scale = 1,     # scaling of PY        verbose = TRUE,  # report what is done?            ... )        # further arguments to glm  gam.Lexis( Lx,         # Lexis object        formula,         # ~ model           from = preceding(Lx, to), # 'from' states             to = absorbing(Lx)    , # 'to' states         paired = FALSE, # only the pairwise           link = "log", # link function          scale = 1,     # scaling of PY        verbose = TRUE,  # report what is done?            ... )        # further arguments to gamcoxph.Lexis( Lx,         # Lexis object        formula,         # timescale ~ model           from = preceding(Lx, to), # 'from' states             to = absorbing(Lx)    , # 'to' states         paired = FALSE, # only the pairwise        verbose = TRUE,  # report what is done?            ... )        # further arguments to coxph

Arguments

Lx

ALexis object representing cohort follow-up.

formula

Model formula describing the model for theintensity(-ies). Forglm andgam, the formula should beone-sided; forcoxph the formula should be two-sided and havethe name of the time-scale used for baseline hazard as the l.h.s.

from

Character vector of statesfrom which transitionsare considered. May also be an integer vector in which case thereference will be to the position of levels oflex.Cst. Defaults to the collection of transient statesimmediately preceding the absorbing states.

to

Character vector of statesto which a transition isconsidered an event. May also be an integer vector in which case thereference will be to the position of levels oflex.Xst.Defaults to the set of absorbing states.

paired

Logical. Should the states mentioned into,rep.from be taken as pairs, indicating the only transitionsmodeled. IfFALSE all transitions from any of the states infrom to any states into are modeled.

link

Character; name of the link function used, allowed valuesare'log' (the default),'identity' and'sqrt',see the familypoisreg.

scale

Scalar.lex.dur is divided by this number before analysis, so that you can get resulting rates on a scale of your wish.

verbose

Print information on the states modeled?

...

Further arguments passed on toglm,glm orcoxph

Details

The functions with and without dots in the name are identical

Theglm andgam models are fitted using the familypoisreg which is a bit faster than the traditionalpoisson family. The response variable for this family is atwo-column vector of events and person-time respectively, so thepredictions, for example usingci.pred does not requirelex.dur (and would ignore this) as variable in thenewdata.ci.pred will return the estimated rates inunits of thelex.dur in theLexis object, scaled byscale, which has a default value of 1.

The default is to model all transitions into any absorbing state bythe same model (how wise is that??). If onlyfrom is given,to is set to all states reachable fromfrom, which maybe a really goofy model and if so a warning is issued. If onlyto is given,from is set to the collection of statesfrom whichto can be reached directly — seepreceding and its cousins. This convention means that ifyou have aLexis object representing a simple survivalanalysis, with states, say, "alive" and "dead", you can dispense withthefrom andto arguments.

Occasionally you only want to model a subset of the possibletransitions from states infrom to states into, inwhich case you specifyfrom andto as character vectorsof the same length and setpaired=TRUE. Then only transitionsfrom[i] toto[i],i=1,2,... will be modeled.

There is no workingupdate functions for these objects (yet).

Strictly speaking, it is a bit counter-intuitive to have the time-scaleon the l.h.s. of the formula for thecoxph since the time scaleis also a predictor of the occurrence rate. On the other hand, callingcoxph directly would also entail having the name of the timescale in theSurv object on the l.h.s. of the formula. So theinconsistency is merely carried over fromcoxph.

Value

glmLexis returns aglm object, which isalso of classglm.lex,gamLexis returns agam object, which isalso of classgam.lex, andcoxphLexis returns acoxph object, which isalso of classcoxph.lex. These extra class attributes are meantto facilitate the (still pending) implementation of anupdate function.

The returned objects all have an extra attribute,Lexis whichis a list with entriesdata, the name of theLexis object modeled (note that itisnot the object, only the name of it, which may not be portable);trans, a character vector of transitions modeled;formula, the model formula; andscale, the scaling applied tolex.dur before modeling.

Only theglm andgam objects have thescale elementin the list; a scalar indicating the scaling oflex.dur beforemodeling. Note that the formula component of theLexisattribute of acoxph object is atwo-sided formula with the baseline time scale as the l.h.s.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com.

Examples

library( Epi )library( survival )data( DMlate )# Lexis object of total follow-upmL <- Lexis( entry = list(age=dodm-dobth,per=dodm),              exit = list(per=dox),       exit.status = factor(!is.na(dodth),labels=c("Alive","Dead")),              data = DMlate )# Cut follow-up at start of insulin usecL <- cutLexis( mL, cut = mL$doins,              timescale = "per",              new.state = "Ins",       precursor.states = "Alive" )# Split follow-up on age-axissystem.time( sL <- splitLexis( cL, breaks=0:25*4, time.scale="age") )# ( consider splitMulti from the popEpi package )summary( sL )# glm models for rates based on the time-split dataset by insulin and sex# Proportional hazards model with insulin as time-dependent variable# - uses the defaul of modeling all transitions from both transient# states ("Alive" and "Ins") to the absorbing state ("Dead"). mt <- glmLexis( sL, ~ sex + lex.Cst + Ns(age,knots=c(15,3:8*10)) )# prediction of mortality rates from "Alive" with and without PH assumptionnA <- data.frame( age=40:70, sex="M", lex.Cst="Alive" )nI <- data.frame( age=40:70, sex="M", lex.Cst="Ins" )matshade( nA$age, cbind( ci.pred(mt,nA),                         ci.pred(mt,nI) )*1000, plot=TRUE,          lwd=3, lty=1, log="y", col=c("black","blue","red"),          xlab="Age", ylab="Mortality per 1000 PY" ) # gam models may take some time to run so we leave it out## Not run: mt.gam <- gamLexis( sL, ~ sex + lex.Cst + s(age), to="Dead",                     scale=1000 )        ## End(Not run)# Fit a Cox model for mortality with age as baseline time scale and# insulin (lex.Cst) as time-dependent covariate mt.cox <- coxphLexis( sL, age ~ sex + lex.Cst, c("Alive","Ins"), "Dead" )# Pretty much the same results for regression paramters as the glm:  ci.exp( mt    , subset="ex" )# ci.exp( mt.gam, subset="ex" )  ci.exp( mt.cox, subset="ex" )

Population mortality rates for Denmark in 1-year age-classes.

Description

ThemortDK data frame has 1820 rows and 21 columns.

Format

This data frame contains the following columns:

`age`:	Age class, 0--89, 90:90+.
`per`:	Calendar period, 38: 1938--42, 43: 1943--47, ..., 88:1988-92.
`sex`:	Sex, 1: male, 2: female.
`risk`:	Number of person-years in the Danish population.
`dt`:	Number of deaths.
`rt`:	Overall mortality rate in cases per 1000 person-years, i.e.`rt=1000*dt/risk`
	Cause-specific mortality rates in cases per 1000 person-years:
`r1`:	Infections
`r2`:	Cancer.
`r3`:	Tumors, benign, unspecific nature.
`r4`:	Endocrine, metabolic.
`r5`:	Blood.
`r6`:	Nervous system, psychiatric.
`r7`:	Cerebrovascular.
`r8`:	Cardiac.
`r9`:	Respiratory diseases, excl. cancer.
`r10`:	Liver, excl. cancer.
`r11`:	Digestive, other.
`r12`:	Genitourinary.
`r13`:	Ill-defined symptoms.
`r14`:	All other, natural.
`r15`:	Violent.

Source

Statistics Denmark, National board of health provided original data. Michael Andersson grouped the causes of death.

Examples

data(mortDK)

Function to group a variable in intervals.

Description

Cuts a continuous variable in intervals. As opposed tocutwhich returns a factor,ncut returns a numeric variable.

Usage

ncut(x, breaks, type="left" )

Arguments

x

A numerical vector.

breaks

Vector of breakpoints.NA will results for valuesbelowmin(breaks) iftype="left", for valuesabovemax(breaks) iftype="right" and for valuesoutsiderange(breaks) iftype="mid"

type

Character: one ofc("left","right","mid"),indicating whether the left, right or midpoint of the intervalsdefined in breaks is returned.

Details

The function uses the base functionfindInterval.

Value

A numerical vector of the same length asx.

Author(s)

Bendix Carstensen, Steno Diabetes Center,b@bxc.dk,http://bendixcarstensen.com, with essential inputfrom Martyn Plummer,martyn.plummer@r-project.org

Examples

br <- c(-2,0,1,2.5)x <- c( rnorm( 10 ), br, -3, 3 )cbind( x, l=ncut( x, breaks=br, type="l" ),          m=ncut( x, breaks=br, type="m" ),          r=ncut( x, breaks=br, type="r" ) )[order(x),]x <- rnorm( 200 )plot( x, ncut( x, breaks=br, type="l" ), pch=16, col="blue", ylim=range(x) )abline( 0, 1 )abline( v=br )points( x, ncut( x, breaks=br, type="r" ), pch=16, col="red" )points( x, ncut( x, breaks=br, type="m" ), pch=16, col="green" )

Nice breakpoints for axes on plots

Description

The function callsprettyfor linear scale. For a log-scale nice are computed using a set ofspecified number in each decade.

Usage

nice(x, log = FALSE, lpos = c(1, 2, 5), xmx = 4, ...)

Arguments

x

Numerical vector to

log

Logical. Is the scale logartimic?

lpos

Numeric. Numbers between 1 and 10 giving the desiredbreakpoints in this interval.

xmx

Numeric. The maximal (absolute) power of 10 to be used fora log-scale.

...

Arguments passed on topretty iflog=FALSE

Value

A vector of breakpoints.

Author(s)

Bendix Carstensen,b@bxc.dk,http://bendixcarstensen.com

Examples

nice( exp( rnorm( 100 ) ), log=TRUE )

A Cohort of Nickel Smelters in South Wales

Description

Thenickel data frame has 679 rows and 7 columns.The data concern a cohort of nickel smelting workers in South Wales andare taken from Breslow and Day, Volume 2. For comparison purposes,England and Wales mortality rates (per 1,000,000 per annum)from lung cancer (ICDs 162 and 163),nasal cancer (ICD 160), and all causes, by age group and calendar period, aresupplied in the datasetewrates.

Format

This data frame contains the following columns:

`id`:	Subject identifier (numeric)
`icd`:	ICD cause of death if dead, 0 otherwise (numeric)
`exposure`:	Exposure index for workplace (numeric)
`dob`:	Date of birth (numeric)
`age1st`:	Age at first exposure (numeric)
`agein`:	Age at start of follow-up (numeric)
`ageout`:	Age at end of follow-up (numeric)

Source

Breslow NE, and Day N, Statistical Methods in Cancer Research. VolumeII: The Design and Analysis of Cohort Studies. IARC ScientificPublications, IARC:Lyon, 1987.

Examples

data(nickel)str(nickel)

A small occupational cohort

Description

This is the data that is behind the illustrative Lexisdiagram in Breslow & Day's book on case-control studies.

Usage

data(occup)

Format

A data frame with 13 observations on the following 4 variables.

AoE: a numeric vector, Age at Entry
DoE: a numeric vector, Date of entry
DoX: a numeric vector, Date of eXit
Xst: eXit statusD-event,W-withdrawal,X-censoring

References

Breslow & Day: Statistical Methods in Cancer Research, vol 1:The analysis of case-control studies, figure 2.2, p. 48.

Examples

data(occup)lx <- Lexis( entry = list( per=DoE, age=AoE ),              exit = list( per=DoX ),      entry.status = "W",       exit.status = Xst,              data = occup )plot( lx )# Split follow-up in 5-year classessx <- splitLexis( lx, seq(1940,1960,5), "per" )sx <- splitLexis( sx, seq(  40,  60,5), "age" )plot( sx )# Plot with a bit more paraphernalia and a device to get# the years on the same physical scale on both axesypi <- 2.5 # Years per inchdev.new( height=15/ypi+1, width=20/ypi+1 ) # add an inch in each direction forpar( mai=c(3,3,1,1)/4, mgp=c(3,1,0)/1.6 )  # the margins set in inches by mai=plot(sx,las=1,col="black",lty.grid=1,lwd=2,type="l",     xlim=c(1940,1960),ylim=c(40,55),xaxs="i",yaxs="i",yaxt="n",     xlab="Calendar year", ylab="Age (years)")axis( side=2, at=seq(40,55,5), las=1 )points(sx,pch=c(NA,16)[(sx$lex.Xst=="D")+1] )box()# Annotation with the person-yearsPY.ann.Lexis( sx, cex=0.8 )

Generate paths traveled through a Lexis multistate model data frame.

Description

Paths that a person traveled through states in aLexismultistate model.

Usage

## S3 method for class 'Lexis'paths(Lx, dfr = FALSE, ...)

Arguments

Lx

ALexis object

dfr

Logical. Should results be returned as a data frame withcolumnslex.id andpath?

...

Arguments passed on. Ignored

Value

A factor with levels describing each person's path through states. It isof lengthlength(nid(Lx)), named by the (character)values ofLx$lex.id. Ifdfr isTRUE a two-columndata frame is returned.

Author(s)

Bendix Carstensen,b@bxc.dk,http://bendixcarstensen.com

Examples

# a simple example from the packageexample(DMlate)str(paths.Lexis(dmi, dfr = TRUE))str(pathD <- paths.Lexis(dmi))cbind(addmargins(table(pathD)))## an example with recurring eventsexample(steno2)summary(L4)str(pathS <- paths.Lexis(L4))cbind(addmargins(table(pathS)))

Plot period and cohort effects in an APC-frame.

Description

When an APC-frame has been produced byapc.frame, thisfunction draws curves or points in the period/cohort part of the frame.

Usage

  pc.points( x, y, ... )  pc.lines( x, y, ... )  pc.matpoints( x, y, ... )  pc.matlines( x, y, ... )  pc.matshade( x, y, ... )  cp.points( x, y, ... )  cp.lines( x, y, ... )  cp.matpoints( x, y, ... )  cp.matlines( x, y, ... )  cp.matshade( x, y, ... )

Arguments

x

vector ofx-coordinates.

y

vector or matrix ofy-coordinates.

...

Further parameters to be transmitted to points, lines,matpoints, matlines or matshade used for plotting curves in thecalendar time realm of a graph generated byapc.frame

Details

Since the Age-part of the frame is referred to by its realcoordinates plotting in the calendar time part requires translationand scaling to put things correctly there, that is done by thefunctionspc.points etc.

The functionscp.points etc. are just synonyms for these, inrecognition of the fact that you can never remember whether it is "pc"or "cp".

Value

The functions return nothing.

Author(s)

Bendix Carstensen, Steno Diabetes Center Copenhagen,http://bendixcarstensen.com

Create percentages in a table

Description

Computes percentages and a margin of totals along a given margin of a table.

Usage

pctab(TT, margin = length(dim(TT)), dec=1)

Arguments

TT

A table or array object

margin

Which margin should be the the total?

dec

How many decimals should be printed? If 0 orFALSEnothing is printed

Value

A table of percentages, where all dimensions except the one specifiedmarginhas two extra levels named "All" (where all entries are 100) and "N".The function prints the table withdec decimals.

Author(s)

Bendix Carstensen, Steno Diabetes Center,http://bendixcarstensen.com.

Examples

Aye <- sample( c("Yes","Si","Oui"), 177, replace=TRUE )Bee <- sample( c("Hum","Buzz"), 177, replace=TRUE )Sea <- sample( c("White","Black","Red","Dead"), 177, replace=TRUE )A <- table( Aye, Bee, Sea )Aftable( pctab( A ) )ftable( pctab( addmargins( A, 1 ), 3 ) )round( ftable( pctab( addmargins( A, 1 ), 3 ), row.vars=3 ), 1)

Lexis diagrams

Description

The follow-up histories represented by a Lexis object can be plottedusing one or two dimensions. The two dimensional plot is a Lexisdiagram showing follow-up time simultaneously on two time scales.

Usage

## S3 method for class 'Lexis'plot(x=Lexis( entry=list(Date=1900,Age=0), exit=list(Age=0) ),                     time.scale = NULL, type="l", breaks="lightgray", ...)## S3 method for class 'Lexis'points(x, time.scale = options()[["Lexis.time.scale"]] , ...)## S3 method for class 'Lexis'lines(x, time.scale = options()[["Lexis.time.scale"]], ...)## S3 method for class 'Lexis'PY.ann(x, time.scale = options()[["Lexis.time.scale"]], digits=1, ...)

Arguments

x

An object of classLexis. The default is a bogusLexisobject, so thatplot.Lexis can be called without the firstargument and still produce a(n empty) Lexis diagram. Unless argumentsxlim andylim are given in this case the diagram islooking pretty daft.

time.scale

A vector of length 1 or 2 giving the time scales tobe plotted either by name or numerical order

type

Character indication what to draw: "n" nothing (just set up thediagram), "l" - liefelines, "p" - endpoints of follow-up, "b" - bothlifelines and endpoints.

breaks

a string giving the colour of grid lines to be drawnwhen plotting a split Lexis object. Grid lines can be suppressed bysupplying the valueNULL to thebreaks argument

digits

Numerical. How many digits after the demimal points should bewhen plotting the person-years.

...

Further graphical parameters to be passed to the plottingmethods.

Grids can be drawn (behind the life lines) using the followingparameters inplot:

grid If logical, a background grid is set upusing the axis ticks. If a list, the first component is used aspositions for the vertical lines and the last as positions for thehorizontal. If a nunerical vector, grids on both axes are set upusing the distance between the numbers.
col.grid="lightgray" Color of the background grid.
lty.grid=2 Line type for the grid.
coh.grid=FALSE Should a 45 degree grid be plotted?

Details

The plot method forLexis objects traces “life lines” fromthe start to the end of follow-up. Thepoints method plotspoints at the end of the life lines.

Iftime.scale is of length 1, the life lines are drawnhorizontally, with the time scale on the X axis and the id value on the Yaxis. Iftime.scale is of length 2, a Lexis diagram isproduced, with diagonal life lines plotted against both time scalessimultaneously.

Iflex has been split along one of the time axes by a call tosplitLexis, then vertical or horizontal grid lines are plotted(on top of the life lines) at the break points.

PY.ann writes the length of each (segment of) life line at the middleof the line. Not advisable to use with large cohorts. Another example isin the example file foroccup.

Author(s)

Martyn Plummer

Examples

# A small bogus cohortxcoh <- structure( list( id = c("A", "B", "C"),                      birth = c("14/07/1952", "01/04/1957", "10/06/1987"),                      entry = c("04/08/1965", "08/09/1972", "23/12/1991"),                       exit = c("27/06/1997", "23/05/1995", "24/07/1998"),                       fail = c(1, 0, 1) ),                     .Names = c("id", "birth", "entry", "exit", "fail"),                  row.names = c("1", "2", "3"),                      class = "data.frame" )# Convert the character dates into numerical variables (fractional years)xcoh$bt <- cal.yr( xcoh$birth, format="%d/%m/%Y" )xcoh$en <- cal.yr( xcoh$entry, format="%d/%m/%Y" )xcoh$ex <- cal.yr( xcoh$exit , format="%d/%m/%Y" )# See how it looksxcoh# Define as Lexis object with timescales calendar time and ageLcoh <- Lexis( entry = list( per=en ),                exit = list( per=ex, age=ex-bt ),         exit.status = fail,                data = xcoh )# Default plot of follow-upplot( Lcoh )# Show follow-up timePY.ann( Lcoh )# Show exit statusplot( Lcoh, type="b" )# Same but failures onlyplot( Lcoh, type="b", pch=c(NA,16)[Lcoh$fail+1] )# With a grid and deaths as endpointsplot( Lcoh, grid=0:10*10, col="black" )points( Lcoh, pch=c(NA,16)[Lcoh$lex.Xst+1] )# With a lot of bells and whistles:plot( Lcoh, grid=0:20*5, col="black", xaxs="i", yaxs="i",      xlim=c(1960,2010), ylim=c(0,50), lwd=3, las=1 )points( Lcoh, pch=c(NA,16)[Lcoh$lex.Xst+1], col="red", cex=1.5 )

Plot the estimates from a fitted Age-Period-Cohort model

Description

This function plots the estimates created byapc.fit in a singlegraph. It just callsapc.frame after computing some sensiblevalues of the parameters, and subsequently plots the estimates usingapc.lines.

Usage

## S3 method for class 'apc'plot( x, r.txt="Rate", ...)          apc.plot( x, r.txt="Rate", ...)

Arguments

x

An object of classapc.

r.txt

The text to put on the vertical rate axis.

...

Additional arguments passed on toapc.lines.

Details

plot.apc is just a wrapper forapc.plot.

Value

A numerical vector of length two, with namesc("cp.offset","RR.fac"). The first is the offset for the cohortperiod-axis, the second the multiplication factor for the rate-ratioscale. Therefore, if you want to plot at(x,y) in the right panel,use(x-res["cp.offset"],y/res["RR.fac"])=(x-res[1],y/res[2]).This vector should be supplied for the parameterframe.par toapc.lines if more sets of estimates is plotted in thesame graph, however seecp.points.

Author(s)

Bendix Carstensen, Steno Diabetes Center,http://bendixcarstensen.com

Examples

data( lungDK )apc1 <- apc.fit( transform( lungDK,                            A = Ax, P = Px, Y = Y/10^5 ),                 ref.c = 1920 )fp <- apc.plot( apc1 )apc.lines( apc1, frame.par=fp, drift=1.01, col="red" )for( i in 1:11 )  apc.lines( apc1, frame.par=fp, drift=1+(i-6)/100, col=rainbow(12)[i] )

Plotting Aalen-Johansen curves for competing events

Description

FunctionplotCIF plots, for one or more groups, thecumulative incidence curves for a selected event out of two or morecompeting events. FunctionstackedCIF plots, for one group orpopulation, the cumulative incidence curves for two or more competingevents such that the cumulative incidences are stacked upon eachother. The CIFs are are estimated by the Aalen-Johansen method.

Usage

## S3 method for class 'survfit' plotCIF( x, event = 1,              xlab = "Time",              ylab = "Cumulative incidence",              ylim = c(0, 1),               lty = 1,               col = "black", ... )## S3 method for class 'survfit'stackedCIF( x, group = 1,                 col = "black",                fill = "white",                ylim = c(0,1),                xlab = "Time",                ylab = "Cumulative incidence", ... )

Arguments

x

An object of classsurvfit, thetype ofevent inSurv() being "mstate"; the first levelof the event factor represents censoring and the remaining ones thealternative competing events.

event

Determines the event for which the cumulative incidencecurve is plotted byplotCIF.

group

An integer showing the selected level of a possiblegrouping factor appearing in the model formula insurvfit whenplotting bystackedCIF

col

A vector specifying the plotting color(s) of the curve(s) forthe different groups inplotCIF– default: all "black".

fill

A vector indicating the colours to be used for shading theareas pertinent to the separate outcomes instackedCIF- default: all"white".

xlab

Label for the $x$-axis.

ylab

Label for the $y$-axis.

ylim

Limits of the $y$-axis.

lty

A vector specifying the line type(s) of the curve(s) forthe different groups - default: all 1 (=solid).

...

Further graphical parameters to be passed.

Details

The order in which the curves withstackedCIF are piledupon each other is the same as the ordering of the values or levels ofthe competing events in the pertinent event variable. The ordering canbe changed by permuting the levels as desired using functionRelevel, after whichsurvfit is called with the relevelledevent variable inSurv()

Value

No value is returned but a plot is produced as a side-effect.

Note

Aalen-Johansen curves for competing events in several groups can alsobe plotted by functionplot.survfit of the survivallibrary as well as by some functions in other packages covering analysisof time-to-event data.

Author(s)

Esa Laara,esa.laara@oulu.fi

References

Putter, H., Fiocco, M., Geskus, R.B. (2007). Tutorial in biostatistics: competing risks and multi-state models. Statistics in Medicine, 26: 2389–2430.

Examples

library(survival)   #  requires version 2.39-4 or laterhead(mgus1)#  Aalen-Johansen estimates of CIF are plotted by sex for two #  competing events: (1) progression (pcm), and (2) death, in #  a cohort of patients with monoclonal gammopathy.#  The data are actually covering transitions from pcm to death, too,#  for those entering the state of pcm. Such patients have two rows#  in the data frame, and in their 2nd row the 'start' time is #  the time to pcm (in days). #  In our analysis we shall only include those time intervals with value 0#  for variable 'start'. Thus, the relevant follow-up time is represented #  by variable 'stop' (days). For convenience, days are converted to years.fitCI <- survfit(Surv(stop/365.25, event, type="mstate") ~ sex,              data= subset(mgus1, start==0) )par(mfrow=c(1,2))plotCIF(fitCI, event = 1, col = c("red", "blue"),  main = "Progression", xlab="Time (years)" )text( 38, 0.15, "Men", pos = 2)text( 38, 0.4, "Women", pos = 2)plotCIF(fitCI, event = 2, col = c("red", "blue"),   main = "Death", xlab="Time (years)" )text( 38, 0.8, "Men", pos = 2)text( 38, 0.5, "Women", pos = 2)par(mfrow=c(1,2))stackedCIF(fitCI, group = 1, fill = c("gray80", "gray90"),  main = "Women", xlab="Time (years)" )text( 36, 0.15, "PCM", pos = 2)text( 36, 0.6, "Death", pos = 2)stackedCIF(fitCI, group = 2, fill = c("gray80", "gray90"),   main = "Men", xlab="Time (years)" )text( 39, 0.10, "PCM", pos = 2)text( 39, 0.6, "Death", pos = 2)

Plot estimates with confidence limits (forest plot)

Description

Plots parameter estimates with confidence intervals, annotated withparameter names. A dot is plotted at the estimate and a horizontalline extending from the lower to the upper limit is superimposed.

Usage

plotEst( ests,            y = dim(ests)[1]:1,          txt = rownames(ests),       txtpos = y,         ylim = range(y)-c(0.5,0),         xlab = "",         xtic = nice(ests[!is.na(ests)], log = xlog),         xlim = range( xtic ),         xlog = FALSE,          pch = 16,          cex = 1,          lwd = 2,          col = "black",      col.txt = "black",     font.txt = 1,    col.lines = col,   col.points = col,         vref = NULL,         grid = FALSE,     col.grid = gray(0.9),  restore.par = TRUE,          ... )linesEst( ests, y = dim(ests)[1]:1, pch = 16, cex = 1, lwd = 2,          col="black", col.lines=col, col.points=col, ... )pointsEst( ests, y = dim(ests)[1]:1, pch = 16, cex = 1, lwd = 2,          col="black", col.lines=col, col.points=col, ... )

Arguments

ests

Matrix with three columns: Estimate, lower limit, upperlimit. If a model object is supplied,ci.lin isinvoked for this object first.

y

Vertical position of the lines.

txt

Annotation of the estimates. Either a character vector oran expression vector.

txtpos

Vertical position of the text. Defaults toy.

ylim

Extent of the vertical axis.

xlab

Annotation of the horizontal axis.

xtic

Location of tickmarks on the x-axis.

xlim

Extent of the x-axis.

xlog

Should the x-axis be logarithmic?

pch

What symbol should be used?

cex

Expansion of the symbol.

col

Colour of the points and lines.

col.txt

Colour of the text annotating the estimates.

font.txt

Font for the text annotating the estimates.

col.lines

Colour of the lines.

col.points

Colour of the symbol.

lwd

Thickness of the lines.

vref

Where should vertical reference line(s) be drawn?

grid

If TRUE, vertical gridlines are drawn at thetickmarks. If a numerical vector is given vertical lines are drawnatgrid.

col.grid

Colour of the vertical gridlines

restore.par

Should the graphics parameters be restored? If settoFALSE the coordinate system will still be available foradditional plotting, andpar("mai") will still have the verylarge value set in order to make room for the labelling of theestimates.

...

Arguments passed on toci.lin when a model object issupplied asests.

Details

plotEst makes a news plot, whereaslinesEst andpointsEst (identical functions) adds to an existing plot.

If a model object of class"glm","coxph","clogistic" or"gnlm" is supplied the argumentxlog defaults toTRUE, and exponentiated estimates are extracted by default.

Value

NULL

Author(s)

Bendix Carstensen,b@bxc.dk,http://bendixcarstensen.com

Examples

# Bogus data and a linear modelf <- factor( sample( letters[1:5], 100, replace=TRUE ) )x <- rnorm( 100 )y <- 5 + 2 * as.integer( f ) + 0.8 * x + rnorm(100) * 2m1 <- lm( y ~ f )# Produce some confidence intervals for contrast to first level( cf <- ci.lin( m1, subset=-1 )[,-(2:4)] )# Plots with increasing amounts of bells and whistlespar( mfcol=c(3,2), mar=c(3,3,2,1) )plotEst( cf )plotEst( cf, grid=TRUE, cex=2, lwd=3 )plotEst( cf, grid=TRUE, cex=2, col.points="red", col.lines="green" )plotEst( cf, grid=TRUE, cex=2, col.points="red", col.lines="green",             xlog=TRUE, xtic=c(1:8), xlim=c(0.8,6) )rownames( cf )[1] <- "Contrast to fa:\n fb"plotEst( cf, grid=TRUE, cex=2, col.points=rainbow(4),                                col.lines=rainbow(4), vref=1 )#etxt <- expression("Plain text, qouted",                   "combined with maths:"*sqrt(a)*phi[c],                   f^d*"  Hb"*A[1][c],                   eff^e*"  kg/"*m^2)plotEst( cf, txt=etxt, grid=TRUE, cex=2, col.points=rainbow(4),                                         col.lines =rainbow(4), vref=1 )

Plot Equivalence Classes

Description

For interval censored data, segments of times between last.well and first.ill are plotted for each conversion in the data. It also plots the equivalence classes.

Usage

plotevent(last.well, first.ill, data)

Arguments

last.well

Time at which the individuals arelast seen negative for the event

first.ill

Time at which the individuals arefirst seen positive for the event

data

Data with a transversal shape

Details

last.well and first.ill should be written as character in the function.

Value

Graph

Author(s)

Delphine Maucort-Boulch, Bendix Carstensen, Martyn Plummer

References

Carstensen B. Regression models for interval censored survival data:application to HIV infection in Danish homosexual men.Stat Med. 1996 Oct30;15(20):2177-89.

Lindsey JC, Ryan LM. Tutorial in biostatistics methods forinterval-censored data.Stat Med. 1998 Jan 30;17(2):219-38.

Family Object for Poisson Regression

Description

Thepoisreg family allows Poisson regression models to befitted using theglm function.

In a Poisson regression model, we assume that the data arise from aPoisson process. We observe D disease events in follow up time Y andwish to estimate the incidence rate, which is assumed to be constantduring the follow-up period for any individual. The incidence ratevaries between individuals according to the predictor variables andthe link function in the model specification.

When using thepoisreg family in theglm function, theresponse should be specified as a two-column matrix with the firstcolumn giving the number of events (D) and the second column givingthe observation time (Y). This is similar to thebinomialfamily for which a two-column outcome can be used representing thenumber of successes and the number of failures.

Usage

poisreg(link = "log")

Arguments

link

a specification for the model link function. Thepoisreg family accepts the linksidentity,log andinverse.

Value

An object of class"family". Seefamilyfor details.

The family name, represented by the element"family" in thereturned object, is"poisson" and not"poisreg". This isnecessary to prevent thesummary.glm function from estimatingan overdispersion parameter (which should be fixed at 1) and thereforegiving incorrect standard errors for the estimates.

Note

When using the log link, Poisson regression can also be carried outusing thepoisson family by including the log follow-up timelog(Y) as an offset. However this approach does not generalizeto other link functions. Thepoisreg family allows more generallink functions including additive risk models withpoisreg(link = "identity").

Examples

  ## Estimate incidence rate of diabetes in Denmark (1996-2015) by  ## age and sex  data(DMepi)  DMepi$agegrp <- cut(DMepi$A, seq(from=0, to=100, by=5))  inc.diab <- glm(cbind(X, Y.nD) ~ -1 + agegrp + sex, family=poisreg,                  data=DMepi)  ## The coefficients for agegrp are log incidence rates for men in each  ## age group. The coefficient for sex is the log of the female:male  ## incidence rate ratio.  summary(inc.diab)  ## Smooth function with non-constant M/F RR:  requireNamespace("mgcv")  library( mgcv )  gam.diab <- gam( cbind(X, Y.nD) ~ s(A,by=sex) + sex,                   family=poisreg,                   data=DMepi)  ## There is no need/use for Y.nD in prediction data frames:  nM <- data.frame( A=20:90, sex="M" )  nF <- data.frame( A=20:90, sex="F" )  ## Rates are returned in units of (1 year)^-1, so we must scale the  ## rates by hand:   matshade( nM$A, cbind( ci.pred(gam.diab,     nM    )*1000,                         ci.pred(gam.diab,        nF )*1000,                         ci.exp( gam.diab,list(nM,nF)) ),            plot=TRUE, col=c("blue","red","black"),            log="y", xlab="Age", ylab="DM incidence rates per 1000     /     M vs. F RR" )  abline(h=1)

Diabetes prevance as of 2010-01-01 in Denmark

Description

Diabetes prevalence as of 2010-01-01 in Denmark in 1-year age classes by sex.

Usage

data("pr")

Format

A data frame with 200 observations on the following 4 variables.

A: Numeric, age, 0-99
sex: Sex, a factor with levelsMF
X: Number of diabetes patients
N: Population size

Examples

data(pr)str(pr)

Projection of columns of a matrix.

Description

Projects the columns of the matrixM on the space spanned by thecolumns of the matrixX, with respect to the inner productdefined byweight:<x|y>=sum(x*w*y).

Usage

projection.ip(X, M, orth = FALSE, weight = rep(1, nrow(X)))

Arguments

X

Matrix defining the space to project onto.

M

Matrix of columns to be projected. Must have the same numberof rows asX.

orth

Should the projection be on the orthogonal complement tospan(X)?

weight

Weights defining the inner product. Numerical vector oflengthnrow(X).

Value

A matrix of full rank with columns inspan(X)

Author(s)

Bendix Carstensen, Steno Diabetes Center,http://bendixcarstensen.com, with help from Peter Dalgaard.

Functions to plot rates from a table classified by age andcalendar time (period)

Description

Produces plots of rates versus age, connected within period or cohort(Aplot), rates versus period connected within age-groups(Pplot) and rates and rates versus date of birth cohort(Cplot).rateplot is a wrapper for these, allowingto produce the four classical displays with a single call.

Usage

rateplot( rates,          which = c("ap","ac","pa","ca"),            age = as.numeric( dimnames( rates )[[1]] ),            per = as.numeric( dimnames( rates )[[2]] ),           grid = FALSE,         a.grid = grid,         p.grid = grid,         c.grid = grid,          ygrid = grid,       col.grid = gray( 0.9 ),          a.lim = range( age, na.rm=TRUE ) + c(0,diff( range( age ) )/30),          p.lim = range( per, na.rm=TRUE ) + c(0,diff( range( age ) )/30),          c.lim = NULL,            ylim = range( rates[rates>0], na.rm=TRUE ),             at = NULL,         labels = paste( at ),          a.lab = "Age at diagnosis",          p.lab = "Date of diagnosis",          c.lab = "Date of birth",           ylab = "Rates",           type = "l",            lwd = 2,            lty = 1,         log.ax = "y",            las = 1,            ann = FALSE,          a.ann = ann,          p.ann = ann,          c.ann = ann,          xannx = 1/20,        cex.ann = 0.8,         a.thin = seq( 1, length( age ), 2 ),         p.thin = seq( 1, length( per ), 2 ),         c.thin = seq( 2, length( age ) + length( per ) - 1, 2 ),            col = par( "fg" ),          a.col = col,          p.col = col,          c.col = col,            ... )Aplot( rates, age = as.numeric( dimnames( rates )[[1]] ),       per = as.numeric( dimnames( rates )[[2]] ), grid = FALSE,       a.grid = grid, ygrid = grid, col.grid = gray( 0.9 ),       a.lim = range( age, na.rm=TRUE ), ylim = range( rates[rates>0], na.rm=TRUE ),       at = NULL, labels = paste( at ), a.lab = names( dimnames( rates ) )[1],       ylab = deparse( substitute( rates ) ), type = "l", lwd = 2, lty = 1,       col = par( "fg" ), log.ax = "y", las = 1, c.col = col, p.col = col,       c.ann = FALSE, p.ann = FALSE, xannx = 1/20, cex.ann = 0.8,       c.thin = seq( 2, length( age ) + length( per ) - 1, 2 ),       p.thin = seq( 1, length( per ), 2 ), p.lines = TRUE,       c.lines = !p.lines, ... )Pplot( rates, age = as.numeric( dimnames( rates )[[1]] ),       per = as.numeric( dimnames( rates )[[2]] ), grid = FALSE,       p.grid = grid, ygrid = grid, col.grid = gray( 0.9 ),       p.lim = range( per, na.rm=TRUE ) + c(0,diff(range(per))/30),       ylim = range( rates[rates>0], na.rm=TRUE ), p.lab = names( dimnames( rates ) )[2],       ylab = deparse( substitute( rates ) ), at = NULL, labels = paste( at ),       type = "l", lwd = 2, lty = 1, col = par( "fg" ), log.ax = "y",       las = 1, ann = FALSE, cex.ann = 0.8, xannx = 1/20,       a.thin = seq( 1, length( age ), 2 ), ... )Cplot( rates, age = as.numeric( rownames( rates ) ),       per = as.numeric( colnames( rates ) ), grid = FALSE,       c.grid = grid, ygrid = grid, col.grid = gray( 0.9 ),       c.lim = NULL, ylim = range( rates[rates>0], na.rm=TRUE ),       at = NULL, labels = paste( at ), c.lab = names( dimnames( rates ) )[2],       ylab = deparse( substitute( rates ) ), type = "l", lwd = 2, lty = 1,       col = par( "fg" ), log.ax = "y", las = 1, xannx = 1/20, ann = FALSE,       cex.ann = 0.8, a.thin = seq( 1, length( age ), 2 ), ...  )

Arguments

rates

A two-dimensional table (or array) with rates to be plotted. It isassumed that the first dimension is age and the second is period.

which

A character vector with elements fromc("ap","ac","apc","pa","ca"), indication which plots shouldbe produced. One plot per element is produced. The first letterindicates the x-axis of the plot, the remaining which groupsshould be connected, i.e."pa" will plot rates versusperiod and connect age-classes, and"apc" will plot ratesversus age, and connect both periods and cohorts.

age

Numerical vector giving the means of theage-classes. Defaults to the rownames ofrates as numeric.

per

Numerical vector giving the means of the periods. Defaultsto the columnnames ofrates as numeric.

grid

Logical indicating whether a background grid should be drawn.

a.grid

Logical indicating whether a background grid on theage-axis should be drawn. If numerical it indicates theage-coordinates of the grid.

p.grid

do. for the period.

c.grid

do. for the cohort.

ygrid

do. for the rate-dimension.

col.grid

The colour of the grid.

a.lim

Range for the age-axis.

p.lim

Range for the period-axis.

c.lim

Range for the cohort-axis.

ylim

Range for the y-axis (rates).

at

Position of labels on the y-axis (rates).

labels

Labels to put on the y-axis (rates).

a.lab

Text on the age-axis. Defaults to "Age".

p.lab

Text on the period-axis. Defaults to "Date of diagnosis".

c.lab

Text on the cohort-axis. Defaults to "Date of birth".

ylab

Text on the rate-axis. Defaults to the name of the rate-table.

type

How should the curves be plotted. Defaults to"l".

lwd

Width of the lines. Defaults to 2.

lty

Which type of lines should be used. Defaults to 1, a solid line.

log.ax

Character with letters from"apcyr", indicatingwhich axes should be logarithmic."y" and"r" bothrefer to the rate scale. Defaults to"y".

las

seepar.

ann

Should the curves be annotated?

a.ann

Logical indicating whether age-curves should be annotated.

p.ann

do. for period-curves.

c.ann

do. for cohort-curves.

xannx

The fraction that the x-axis is expanded when curves are annotated.

cex.ann

Expansion factor for characters annotating curves.

a.thin

Vector of integers indicating which of the age-classesshould be labelled.

p.thin

do. for the periods.

c.thin

do. for the cohorts.

col

Colours for the curves.

a.col

Colours for the age-curves.

p.col

do. for the period-curves.

c.col

do. for the cohort-curves.

p.lines

Should rates from the same period be connected?

c.lines

Should rates from the same cohort be connected?

...

Additional arguments pssed on tomatlines whenplotting the curves.

Details

Zero values of the rates are ignored. They are neiter in the plot nor inthe calculation of the axis ranges.

Value

NULL. The function is used for its side-effect, the plot.

Author(s)

Bendix Carstensen, Steno Diabetes Center,http://bendixcarstensen.com

Examples

data( blcaIT )attach(blcaIT)# Table of rates:bl.rate <- tapply( D, list(age,period), sum ) /           tapply( Y, list(age,period), sum )bl.rate# The four classical plots:par( mfrow=c(2,2) )rateplot( bl.rate*10^6 )# The labels on the vertical axis could be nicer:rateplot( bl.rate*10^6, at=10^(-1:3), labels=c(0.1,1,10,100,1000) ) # More bells an whistlespar( mfrow=c(1,3), mar=c(3,3,1,1), oma=c(0,3,0,0), mgp=c(3,1,0)/1.6 )rateplot( bl.rate*10^6, ylab="", ann=TRUE, which=c("AC","PA","CA"),                      at=10^(-1:3), labels=c(0.1,1,10,100,1000),                      col=topo.colors(11), cex.ann=1.2 )

A function to cut follow-up at intermediate event times.

Description

Cuts follow-up at intermediate event times, multiple events per personare allowed, as well as recurrences of the sme type of event. Theresulting states only refer to the last assumed state, unlike the resultfrommcutLexis.

Usage

rcutLexis( Lx, cut,    timescale = 1,    precursor.states = transient(Lx))

Arguments

Lx

ALexis object to be amended,.

cut

A data frame with columnslex.id,cut (event times) andnew.state (event type, character)

timescale

What time scale do values incut$cut refer to. Numeric or character.

precursor.states

an optional vector of states to be consideredas "less severe" thannew.state. See Details in thedocumentation ofcutLexis

Value

ALexis object with follow-up cut at the eventtimes supplied incut

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

df <- data.frame(lex.id = rep(c(3, 7), c(3, 5)))df$new.state <- sample(LETTERS[2:4], 8, r = TRUE) df$cut <- round(runif(8) * 100) + 1dfLx <- Lexis( exit = list(time=c(89, 97)),               id = c(3, 7),      exit.status = factor(c("A", "X")) )Lx rcutLexis(Lx, df, pre = "A")

Remove transitions from a Lexis object.

Description

Sometimes certain transitions are not of interest. This function removesthese and assigns the risk time in the target state of the transitionsto the originating state.

Usage

rm.tr(obj, from, to)

Arguments

obj

ALexis object.

from

Character; name of the state from which the transition to be purgedoriginates. Must be a valid state name forobj.

to

Character; name of the state to which the transition to be purgedtargets. Must be a valid state name forobj.

Details

The function removes all transitions fromfrom toto, andassigns all risk time in theto state after the transition(lex.dur) to thefrom state. This is only done for risktime into occurring directly afterfrom. Risk time into occurring after a transition from states different fromfrom is not affected. Transitions fromto to anotherstate,other, say, will be changed to transitions fromfrom toother.

Value

ALexis object with the indicated transition removed.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com.

Examples

data(DMlate)dml <- Lexis( entry = list(Per=dodm, Age=dodm-dobth, DMdur=0 ),               exit = list(Per=dox),        exit.status = factor(!is.na(dodth),labels=c("DM","Dead")),               data = DMlate )# A small subset for illustrationdml <- subset( dml, lex.id %in% c(13,15,20,28,40) )# Cut the follow-up at start of insulin therapydmi <- cutLexis( dml, cut = dml$doins,                      pre = "DM",                new.state = "Ins" )[,1:10]# How does it look?dmi# Remove all transitions DM -> Insrm.tr( dmi, "DM", "Ins" )

Simulate a Lexis object representing follow-up in a multistatemodel.

Description

Based on a (pre-)Lexis object representing personsat given states and times, and full specification of transitionintensities between states in the form of models for the transitionrates, this function simulates transition times and -types for personsand returns aLexis object representing the simulatedcohort. The simulation scheme accommodates multiple timescales,including time since entry into an intermediate state, and acceptsfitted Poisson models, Cox-models or just a function as specificationof rates.

Usage

simLexis( Tr, init,           N = 1,      lex.id,     t.range = 20,       n.int = 101,    time.pts = seq(0,t.range,length.out=n.int) )nState( obj, at, from, time.scale = 1 )pState( nSt, perm = 1:ncol(nSt) )## S3 method for class 'pState'plot( x,                     col = rainbow(ncol(x)),                  border = "transparent",                    xlab = "Time",                    ylim = 0:1,                    ylab = "Probability", ... )## S3 method for class 'pState'lines( x,                      col = rainbow(ncol(x)),                   border = "transparent", ... )

Arguments

Tr

A named list of named lists. The names of the list are namesof the transient states in the model. Each list element is again anamed list. The names of the elements of this inner list are thenames of the states reachable from the state with name equal to thelist. Elements of the intter lists represent transitions. See details.

init

A (pre-)Lexis object representing the initialstate of the persons whose trajectories through the multiple stateswe want to simulate. Must have attributes "time.scales" and "time.since" — seedetails. Duplicate values oflex.id are not sensible and notaccepted.

N

Numeric. How many persons should be simulated.Npersons with covariate configuration of each row ininitwill be simulated. Either a scalar or a vector of lengthnrow(init).

lex.id

Vector of ids of the simulated persons. Useful whensimulating in chunks.

t.range

Numerical scalar. The range of time over which tocompute the cumulative rates when simulating. Simulted timesbeyond this will result in an obervation censored att.rangeafter entry.

n.int

Number of intervals to use when computing (cumulative) rates.

time.pts

Numerical vector of times since start. Cumulativerates for transitions are computed at these times after stater andentry state. Simulation is only done till timemax(time.pts)after start, where persons are censored. Must start with 0.

obj

ALexis object.

from

The point on the time scaletime.scale from whichwe start counting.

time.scale

The timescale to whichfrom refer.

at

Time points (afterfrom) where the number of personsin each state is to be computed.

nSt

A table obtained bynState.

perm

A permutation of columns used before cumulating row-wiseand taking percentages.

x

An object of classpState, e.g. created bypState.

col

Colors for filling the areas between curves.

border

Colors for outline of the areas between curves.

xlab

Label on x-axis

ylim

Limits on y-axis

ylab

Label on y-axis

...

Further arguments passed on toplot.

Details

The simulation commandsimLexis is not defined as amethod forLexis objects, because the input is not aLexis object, theLexis-like object is merelyrepresenting a prevalent population and a specification of whichvariables that are timescales. The variableslex.dur andlex.Xst are ignored (and overwritten) if present. The coreinput is the listTr giving the transitions.

The components ofTr represents the transition intensitiesbetween states. The transition from stateA toB, say,is assumed stored inTr$A$B. Thus names of the elements ofTr are names of transient states, and the names of the elementsof each these are the names of states reachable from the correspondingtransient state.

The transition intensities are assumed modelled by either a glm withPoisson family or a Cox-model. In both cases the timescale(s) in themodel must be using the names fo the timescales in a Lexis objectrepresentng the follow-up in a cohort, and the risk time must be takenfrom the variablelex.dur — see the example.

Alternatively, an element inTr could be a functionthat from a data frame produces transition rates, or specificallycumulative transition rates over intervals of lengthlex.dur.

The pre-Lexis objectinit must contain values of allvariables used in any of the objects inTr, as well as alltimescales - even those not used in the models. Moreover, theattributestime.scales andtime.since must bepresent. The attributetime.since is a character vector of thesame length astime.scales and an element has value"A"if the corresponding time scale is defined as"time since entry into stateA", otherwise the value is"". If not present it will be set to a vector of""s,which is only OK if no time scales are defined as time since entry toa state.

Note that the variables pre-Lexis objectinit must havethe same mode and class as in the dataset used for fitting the models— hence the indexing of rows by brackets in the assignment of values used inthe example below - this way the variables have their attributespreserved; usinginit[,"var"] <- orinit$var <- replacesthe variable, whereasinit[1:4,"var"] <- orinit$var[1:4] <- replaces values only and prevents you fromentering non-existing factor levels etc.

The functionLexis automatically generates an attributetime.since, andcutLexis updates it when new timescales are defined. Hence, the simplest way of defining a initialpre-Lexis object representing a current state of a (set of) personsto be followed through a multistate model is to takeNULL rowsof an existing Lexis object (normally the one used for estimation),and so ensuring that all relevant attributes and state levels areproperly defined. See the example code.

The prevalence functionnState computes the distribution ofindividuals in different states at prespecified times. Only sensiblefor a simulatedLexis object. The functionpState takesa matrix as output bynState and computes the row-wisecumulative probabilities across states, and leaves an object of classpState, suitable for plotting.

Value

simLexis returns aLexis object representingthe experience of a population starting asinit followedthrough the states according to the transitions inTr.

The functionnState returns a table of persons classified bystates at each of the times inat. Note that this function caneasily produce meaningless results, for example if applied to aLexis object not created by simulation. If you apply it to aLexis object generated bysimLexis, you must make surethat you start (from) the point where you started thesimulation on the correct timescale, and you will get funny results ifyou try to tabulate beyond the censoring time for the simulation.The resulting object has class"table".

The result from usingpState on the result fromnStatehas classc("pState","matrix").

Author(s)

Bendix Carstensen,http://bendixcarstensen.com.

Examples

data(DMlate)dml <- Lexis( entry = list(Per=dodm, Age=dodm-dobth, DMdur=0 ),               exit = list(Per=dox),        exit.status = factor(!is.na(dodth),labels=c("DM","Dead")),               data = DMlate[runif(nrow(DMlate))<0.1,] )# Split follow-up at insulin, introduce a new timescale,# and split non-precursor statesdmi <- cutLexis( dml, cut = dml$doins,                      pre = "DM",                new.state = "Ins",                new.scale = "t.Ins",             split.states = TRUE )# Split the follow in 1-year intervals for modellingSi <- splitLexis( dmi, 0:30/2, "DMdur" )# Define knotsnk <- 4( ai.kn <- with( subset(Si,lex.Xst=="Ins"),                 quantile( Age+lex.dur, probs=(1:nk-0.5)/nk ) ) )( ad.kn <- with( subset(Si,lex.Xst=="Dead"),                 quantile( Age+lex.dur, probs=(1:nk-0.5)/nk ) ) )( di.kn <- with( subset(Si,lex.Xst=="Ins"),                 quantile( DMdur+lex.dur, probs=(1:nk-0.5)/nk ) ) )( dd.kn <- with( subset(Si,lex.Xst=="Dead"),                 quantile( DMdur+lex.dur, probs=(1:nk-0.5)/nk ) ) )( td.kn <- with( subset(Si,lex.Xst=="Dead(Ins)"),                 quantile( t.Ins+lex.dur, probs=(1:nk-0.5)/nk ) ) )# Fit Poisson models to transition rateslibrary( splines )DM.Ins <- glm( (lex.Xst=="Ins") ~ Ns( Age  , knots=ai.kn ) +                                  Ns( DMdur, knots=di.kn ) +                                  I(Per-2000) + sex,               family=poisson, offset=log(lex.dur),               data = subset(Si,lex.Cst=="DM") )DM.Dead <- glm( (lex.Xst=="Dead") ~ Ns( Age  , knots=ad.kn ) +                                    Ns( DMdur, knots=dd.kn ) +                                    I(Per-2000) + sex,               family=poisson, offset=log(lex.dur),               data = subset(Si,lex.Cst=="DM") )Ins.Dead <- glm( (lex.Xst=="Dead(Ins)") ~ Ns( Age  , knots=ad.kn ) +                                          Ns( DMdur, knots=dd.kn ) +                                          Ns( t.Ins, knots=td.kn ) +                                          I(Per-2000) + sex,               family=poisson, offset=log(lex.dur),               data = subset(Si,lex.Cst=="Ins") )# Stuff the models into an object representing the transitionsTr <- list( "DM" = list( "Ins"       = DM.Ins,                         "Dead"      = DM.Dead  ),           "Ins" = list( "Dead(Ins)" = Ins.Dead ) )lapply( Tr, names )# Define an initial object - note the subsetting that ensures that# all attributes are carried overini <- Si[1,1:9][-1,]ini[1:2,"lex.Cst"] <- "DM"ini[1:2,"Per"] <- 1995ini[1:2,"Age"] <- 60ini[1:2,"DMdur"] <- 5ini[1:2,"sex"] <- c("M","F")str(ini)# Simulate 200 of each sex using the estimated models in TrsimL <- simLexis( Tr, ini, time.pts=seq(0,11,0.5), N=200 )summary( simL )# Find the number of persons in each state at a set of times.# Note that the times are shirter than the time-span simulated.nSt <- nState( subset(simL,sex=="M"),               at=seq(0,10,0.1), from=1995, time.scale="Per" )nSt# Show the cumulative prevalences in a different order than that of the# state-level ordering and plot them using all defaultspp <- pState( nSt, perm=c(1,2,4,3) )head( pp )plot( pp )# A more useful set-up of the graphclr <- c("orange2","forestgreen")par( las=1 )plot( pp, col=clr[c(2,1,1,2)] )lines( as.numeric(rownames(pp)), pp[,2], lwd=2 )mtext( "60 year old male, diagnosed 1995", side=3, line=2.5, adj=0 )mtext( "Survival curve", side=3, line=1.5, adj=0 )mtext( "DM, no insulin   DM, Insulin", side=3, line=0.5, adj=0, col=clr[1] )mtext( "DM, no insulin", side=3, line=0.5, adj=0, col=clr[2] )axis( side=4 )# Using a Cox-model for the mortality rates assuming the two mortality# rates to be proportional:# When we fit a Cox-model, lex.dur must be used in the Surv() function,# and the I() constrction must be used when specifying intermediate# states as covariates, since factors with levels not present in the# data will create NAs in the parameter vector returned by coxph, which# in return will crash the simulation machinery.library( survival )Cox.Dead <- coxph( Surv( DMdur, DMdur+lex.dur,                         lex.Xst %in% c("Dead(Ins)","Dead")) ~                   Ns( Age-DMdur, knots=ad.kn ) +                   I(lex.Cst=="Ins") +                   I(Per-2000) + sex,               data = Si )Cr <- list( "DM" = list( "Ins"       = DM.Ins,                         "Dead"      = Cox.Dead  ),           "Ins" = list( "Dead(Ins)" = Cox.Dead ) )simL <- simLexis( Cr, ini, time.pts=seq(0,11,0.2), N=200 )summary( simL )nSt <- nState( subset(simL,sex=="M"),               at=seq(0,10,0.2), from=1995, time.scale="Per" )pp <- pState( nSt, perm=c(1,2,4,3) )plot( pp )

Split follow-up time in a Lexis object

Description

ThesplitLexis function divides each row of aLexisobject into disjoint follow-up intervals according to the suppliedbreak points.

Usage

splitLexis(lex, breaks, time.scale, tol=.Machine$double.eps^0.5)

Arguments

lex

an object of classLexis

breaks

a vector of break points

time.scale

the name or number of the time scale to be split

tol

numeric value >= 0. Intervals shorter than this value aredropped

Value

An object of classLexis with multiple rows for each row ofthe argumentlex. Each row of the newLexis objectcontains the part of the follow-up interval that falls inside one ofthe time bands defined by the break points.

The variables representing the various time scales, are appropriatelyupdated in the newLexis object. The entry and exit statusvariables are also updated according to the rule that the entry statusis retained until the end of follow-up. All other variables areconsidered to represent variables that are constant in time, and soare replicated across all rows having the same id value.

Note

ThesplitLexis() function divides follow-up time into intervalsusing breakpoints that are common to all rows of theLexis object.To split aLexis object by break points that are unique to eachrow, use thecut.Lexis function.

Author(s)

Martyn Plummer

Examples

# A small bogus cohortxcoh <- structure( list( id = c("A", "B", "C"),                      birth = c("14/07/1952", "01/04/1954", "10/06/1987"),                      entry = c("04/08/1965", "08/09/1972", "23/12/1991"),                       exit = c("27/06/1997", "23/05/1995", "24/07/1998"),                       fail = c(1, 0, 1) ),                     .Names = c("id", "birth", "entry", "exit", "fail"),                  row.names = c("1", "2", "3"),                      class = "data.frame" )# Convert the character dates into numerical variables (fractional years)xcoh$bt <- cal.yr( xcoh$birth, format="%d/%m/%Y" )xcoh$en <- cal.yr( xcoh$entry, format="%d/%m/%Y" )xcoh$ex <- cal.yr( xcoh$exit , format="%d/%m/%Y" )# See how it looksxcoh# Define as Lexis object with timescales calendar time and ageLcoh <- Lexis( entry = list( per=en ),                exit = list( per=ex, age=ex-bt ),         exit.status = fail,                data = xcoh )# Default plot of follow-upplot( Lcoh )# With a grid and deaths as endpointsplot( Lcoh, grid=0:10*10, col="black" )points( Lcoh, pch=c(NA,16)[Lcoh$lex.Xst+1] )# With a lot of bells and whistles:plot( Lcoh, grid=0:20*5, col="black", xaxs="i", yaxs="i",      xlim=c(1960,2010), ylim=c(0,50), lwd=3, las=1 )points( Lcoh, pch=c(NA,16)[Lcoh$lex.Xst+1], col="red", cex=1.5 )# Split time along two time-axes( x2 <- splitLexis( Lcoh, breaks = seq(1900,2000,5), time.scale="per") )( x2 <- splitLexis( x2, breaks = seq(0,80,5), time.scale="age" ) )str( x2 )# Tabulate the cases and the person-yearssummary( x2 )tapply( status(x2,"exit")==1, list( timeBand(x2,"age","left"),                                    timeBand(x2,"per","left") ), sum )tapply( dur(x2),  list( timeBand(x2,"age","left"),                        timeBand(x2,"per","left") ), sum )

Functions to facilitate analysis of multistate models.

Description

stack.Lexis produces a stacked object suited for analysis ofseveral transition intensities simultaneously.

Usage

## S3 method for class 'Lexis'stack(x, ...)tmat( x, ... )## S3 method for class 'Lexis'tmat(x, Y=FALSE, mode = "numeric", ...)

Arguments

x

ALexis object.

Y

Logical. Should the risk time be put in the diagonal? This isa facility which is used byboxes.Lexis.

mode

Should the matrix be returned as a numeric matrix withNAs at unused places or (mode="logical") as a logicalmatrix withFALSE on the diagonal.

...

Not used.

Value

tmat.Lexis returns a square transition matrix, classified by thelevels oflex.Cst andlex.Xst, for every transitionoccurring the entry is the number of transitions occurring andNAin all oter entries. IfY=TRUE, the diagonal will contain therisk time in each of the states.

stack.Lexis returns a dataframe to be used for analysis ofmultistate data when all transitions are modelled together, for exampleif some parameters are required to be the same for different transitions.The dataframe has classstacked.Lexis, and inherits theattributestime.scales andbreaks from theLexisobject, and so functiontimeBand applies to astacked.Lexis object too.

The dataframe has same variables as the originalLexis object,but with each record duplicated as many times as there are possibleexits from the current state,lex.Cst. Two variables are added:lex.Fail, an indicator of wheter an event for the transitionnamed in the factorlex.Tr has occurred or not.lex.Tr isa factor with levels made up of combinations of the levels oflex.Cst andlex.Xst that do occur together inx,joined by a "->".

Author(s)

Bendix Carstensen,b@bxc.dk,http://bendixcarstensen.com

Examples

data(DMlate)str(DMlate)dml <- Lexis( entry=list(Per=dodm, Age=dodm-dobth, DMdur=0 ),               exit=list(Per=dox),        exit.status=factor(!is.na(dodth),labels=c("DM","Dead")),               data=DMlate )dmi <- cutLexis( dml, cut=dml$doins, new.state="Ins", pre="DM" )summary( dmi )ls.dmi <- stack( dmi )str( ls.dmi )# Check that all the transitions and person-years got across.with( ls.dmi, rbind( table(lex.Fail,lex.Tr),                     tapply(lex.dur,lex.Tr,sum) ) )

Tables of summary statistics

Description

stat.table creates tabular summaries of the data, using alimited set of functions. A list of index variables is usedto cross-classify summary statistics. It does NOT work insidewith()!

Usage

stat.table(index, contents = count(), data, margins = FALSE)## S3 method for class 'stat.table'print(x, width=7, digits,...)

Arguments

index

A factor, or list of factors, used for cross-classification.If the list is named, then the names will be used when printing thetable. This feature can be used to give informative labels to thevariables.

contents

A function call, or list of function calls. Only alimited set of functions may be called (See Details below). If thelist is named, then the names will be used when printing the table.

data

an optional data frame containing the variables to betabulated. If this is omitted, the variables will be searched for in thecalling environment.

margins

a logical scalar or vector indicating which marginaltables are to be calculated. If a vector, it should be the samelength as theindex argument: values corresponding toTRUE will be retained in marginal tables.

x

an object of classstat.table.

width

a scalar giving the minimum column width when printing.

digits

a scalar, or named vector, giving the numberof digits to print after the decimal point. If a named vector is used,the names should correspond to one of the permitted functions (SeeDetails below) and all results obtained with that function will beprinted with the same precision.

...

further arguments passed to other print methods.

Details

This function is similar totapply, with some enhancements:multiple summaries of multiple variables may be mixed in thesame table; marginal tables may be calculated; columns and rows maybe given informative labels; pretty printing may be controlled by theassociated print method.

This function is not a replacement fortapply as it also hassome limitations. The only functions that may be used in thecontents argument are:count,mean,weighted.mean,sum,quantile,median,IQR,max,min,ratio,percent, andsd.

Thecount() function, which is the default, simply creates acontingency table of counts. The other functions are applied toeach cell created by combinations of theindex variables.

Value

An object of classstat.table, which is a multi-dimensionalarray. A print method is available to create formatted one-way andtwo-way tables.

Note

The permitted functions in the contents list are defined insidestat.table. They have the same interface asthe functions callable from the command line, except for twodifferences. If there is an argumentna.rm then its defaultvalue is alwaysTRUE. A second difference is that thequantile function can only produce a single quantile in each call.

Author(s)

Martyn Plummer

Examples

data(warpbreaks)# A one-way tablestat.table(tension,list(count(),mean(breaks)),data=warpbreaks)# The same table with informative labelsstat.table(index=list("Tension level"=tension),list(N=count(),           "mean number of breaks"=mean(breaks)),data=warpbreaks)# A two-way tablestat.table(index=list(tension,wool),mean(breaks),data=warpbreaks)  # The same table with margins over tension, but not woolstat.table(index=list(tension,wool),mean(breaks),data=warpbreaks,           margins=c(TRUE, FALSE))# A table of column percentagesstat.table(list(tension,wool), percent(tension), data=warpbreaks)# Cell percentages, with marginsstat.table(list(tension,wool),percent(tension,wool), margin=TRUE,           data=warpbreaks)# A table with multiple statistics# Note how each statistic has its own default precisiona <- stat.table(index=list(wool,tension),                contents=list(count(),mean(breaks),percent (wool)),                data=warpbreaks)print(a)# Print the percentages rounded to the nearest integerprint(a, digits=c(percent=0))

Special functions for use in stat.table

Description

These functions may be used ascontents arguments to thefunctionstat.table. They are defined internally instat.table and have no independent existence.

Usage

count(id)ratio(d,y,scale=1, na.rm=TRUE)percent(...)

Arguments

id

numeric vector in which identical values identify thesame individual.

d,y

numeric vectors of equal length (d for Deaths,y for person-Years)

scale

a scalar giving a value by which the ratio should bemultiplied

na.rm

a logical value indicating whetherNA values shouldbe stripped before computation proceeds.

...

a list of variables taken from theindex argumenttostat.table

Value

When used as acontents argument tostat.table, thesefunctions create the following tables:

count

If given without argument (count()) itreturns a contingency table of counts. If given anidargument it returns a table of the number of different values ofid in each cell, i.e. how many persons contribute in eachcell.

ratio

returns a table of valuesscale * sum(d)/sum(y)

percent

returns a table of percentages of theclassifying variables. Variables that are in theindexargument tostat.table but not in the call topercent are used to define strata, within which thepercentages add up to 100.

Author(s)

Martyn Plummer

Clinical trial: Steno2 baseline and follow-up.

Description

Steno-2 was a clinical trial conducted at Steno Diabetes Center1993-2001. The intervention was intensified treatment versusconventional treatment of diabetes patients with micro-albuminuria. Thedatsets here concern the extended follow-up of the trial population till2015. Three files are provided:steno2 with one record perperson,st2clin with one record per clinical visit andst2alb with one record per transition between states ofalbuminuria.

These dataset are entirely simulated, but designed to giveapproximately the same results as the original.

Usage

data("steno2")       data("st2clin")       data("st2alb")

Format

steno2 is a data frame with 160 observations on the following 14variables:

id: person id, numeric
allo: Original trial allocation, a factor with levelsIntConv
sex: Sex, a factor with levelsFM
baseCVD: 0/1 indicator of preexisting CVD at baseline
deathCVD: 0/1 indicator whether cause of death was CVD
doBth: Date of birth, a Date
doDM: Date of diabetes diagnosis, a Date
doBase: Date of entry to study, a Date
doCVD1: Date of 1st CVD event, a Date
doCVD2: Date of 2nd CVD event, a Date
doCVD3: Date of 3rd CVD event, a Date
doESRD: Date of end stage renal disease, a Date
doEnd: Date of exit from follow-up, a Date
doDth: Date of death, a Date

st2clin is data frame with 750 observations on clinicalmeasurements at different clinical visits:

id: person id, numeric
doV: Date of clinical visit, a Date
a1c: Glycosylated hemoglobin, mmol/mol
chol: Total cholesterol, mg/mol
crea: Creatinine, mg/mol

st2alb is data frame with 307 observations of changes incomplication (albuminuria) state

id: person id, numeric
doTr: Date of transition, a Date
state: State of albuminuria, factor with levelsNorm,Mic,Mac. All personsbegin in the stateMicro-albuminuria.

Details

The data are not the original; all values of measurements and dateshave been randomly perturbed, to prevent identifiability ofindividuals. Analysis of these data will give only (very)approximately the same results as in the published article, and onlysome of the aspects of data are included.

References

P. Gaede, J. Oellgaard, B. Carstensen, P. Rossing, H. Lund-Andersen,H. H. Parving & O. Pedersen: Years of life gained by multifactorialintervention in patients with type 2 diabetes mellitus andmicroalbuminuria: 21 years follow-up on the Steno-2randomised trial. Diabetologia (2016), 59, pp 2298-2307

Examples

data(steno2)data(st2alb)L2 <- Lexis( entry = list(per = doBase,                          age = doBase - doBth),              exit = list(per = doEnd),       exit.status = factor(deathCVD + !is.na(doDth),                            labels=c("Mic","D(oth)","D(CVD)")),                id = id,              data = cal.yr(steno2) )summary(L2)## Cut at intermediate transitionscut2 <- data.frame(lex.id = st2alb$id,                      cut = cal.yr(st2alb$do),                new.state = st2alb$state)L3 <- rcutLexis(L2, cut2)summary(L3)## no direct transitions Mic <-> Mac allowed, so put a cut in between:dd <- subset(L3, (lex.Cst == "Mac" & lex.Xst =="Norm") |                 (lex.Cst =="Norm" & lex.Xst == "Mac"))# artificial visits to the middle state Mic: cut3 <- data.frame( lex.id = dd$lex.id,                       cut = dd$per + dd$lex.dur/2,                 new.state = "Mic")L4 <- rcutLexis(L3, cut3)summary(L4)## Show all transitionsboxes(L4, boxpos = list(x = c(15,15,15,85,85),                        y = c(50,15,85,25,75)),          show.BE = TRUE, scale.R = 1000,          cex=0.8, pos.arr=0.7, font=1, font.arr=1)

Subsetting Lexis (and stacked.Lexis) objects

Description

Return subsets of Lexis objects which meet conditions

Usage

## S3 method for class 'Lexis'subset(x, ...)## S3 method for class 'Lexis'x[...]## S3 method for class 'stacked.Lexis'subset(x, ...)

Arguments

x

an object of classLexis

...

additional arguments to be passed tosubset.data.frame. This will normally be some logicalexpression selecting a subset of the rows. For details seesubset.data.frame.

Details

The subset method forLexis objects works exactly as the methodfor data frames. So does the "[" method. The special methods are needed inorder to propagate the Lexis-specific attributes.

The method forstacked.Lexis objects also shrinks the set oflevels forlex.Cst andlex.Xst to those actuallyoccurring in the resulting data frame.

Value

ALexis object with selected rows and columns.

Author(s)

Martyn Plummer

Summarize transitions and risk time from a Lexis object

Description

A two-way table of records and transitions classified by states(lex.Cst andlex.Xst), as well the risk time in each state.

Usage

  ## S3 method for class 'Lexis'summary(object, simplify = TRUE, scale = 1, by = NULL,                          Rates = FALSE, timeScales = FALSE, ...)  ## S3 method for class 'summary.Lexis'print(x, ..., digits = 2)

Arguments

object

A Lexis object.

simplify

Should rows with 0 follow-up time be dropped?

scale

Scaling factor for the rates. The calculated rates aremultiplied by this number.

by

Character vector of name(s) of variable(s) inobject. Used to give a separate summaries for subsets ofobject. If longer than than 1, the interaction between thatvariables is used to stratify the summary. It is also possible tosupply a vector of lengthnrow(object), and the distinctvalues of this will be used to stratify the summary.

Rates

Should a component with transition rates be returned (andprinted) too?

timeScales

Should the names of the timescales and the indicationof since which entry also be given?

x

ALexis orsummary.Lexis object.

digits

Number of digits after the decimal separator used whenprinting the summary.

...

Ignored.

Value

An object of classsummary.Lexis, a list with two components,Transitions andRates, each one a matrix with rowsclassified by states where persons spent time, and columns classifiedby states to which persons transit. TheTransitions containsnumber of transitions and has 4 extra columns with number of records,total number of events, total risk time and number of personcontributing attached. TheRates contains the transitionsrates.

If the argumentRates is FALSE (the default), then only thefirst component of the list is returned.

Author(s)

Bendix Carstensen,http://bendixcarstensen.com

Examples

data( nickel )# Lung cancer deaths and other deaths are coded 1 and 2nic <- Lexis( data = nickel,             entry = list(age = agein),              exit = list(age = ageout,cal = ageout+dob,tfh = ageout-age1st),       exit.status = factor( (icd > 0) + (icd %in% c(162,163)),                           labels = c("Alive","Other","Lung") ) )str( nic )head( nic )summary( nic )# More detailed summary, by exposure levelsummary( nic, by = nic$exposure>5, Rates = TRUE, scale = 100 )

Testis cancer incidence in Denmark, 1943–1996

Description

Number of testiscancer cases and male person-years in theDanish population 1943–1996

Usage

data(testisDK)

Format

A data frame with 4860 observations on the following 4 variables.

A: Age class, 0,1,2,...,89
P: Year, 1943,...,1996
D: Number of testis cancer cases
Y: Person years

Source

The Danish Cancer Registry

Examples

data(testisDK)head(testisDK)

Thorotrast Study

Description

Thethoro data frame has 2470 rows and 14 columns.Each row represents one patient that have had cerebral angiography (X-ray ofthe brain) with an injected contrast medium, either Thorotrast oranother one (the controls).

Format

This data frame contains the following columns:

id: Identification of person.
sex: Sex, 1: male / 2: female.
birthdat: Date of birth,Date variable.
contrast: Group, 1: Thorotrast / 2: Control.
injecdat: Date of contrast injection,Date variable.
volume: Injected volume of Thorotrast in ml. Controlpatients have a 0 in this variable.
exitdat: Date of exit from the study,Date variable.
exitstat: Status at exit, 1: dead / 2: alive,censored at closing of study, 20 February 1992 / 3:censored alive at some earlier date.
cause: Cause of death. See causes in the helpfile forgmortDK.
liverdat: Date of liver cancer diagnosis,Date variable.
liver: Indicator of liver cancer diagnosis. Not all livercancersare histologically verified, henceliver >= hepcc + chola + hmang
hepcc: Hepatocellular carcinoma atliverdat.
chola: Cholangiocellular carcinoma atliverdat.
hmang: Haemangisarcoma carcinoma atliverdat.

Source

M Andersson, M Vyberg, J Visfeldt, B Carstensen & HH Storm:Primary liver tumours among Danish patients exposed to Thorotrast.Radiation Research, 137, pp. 262–273, 1994.

M Andersson, B Carstensen HH Storm: Mortality and cancerincidence after cerebral angiography. Radiation Research, 142,pp. 305–320, 1995.

Examples

data(thoro)str(thoro)

Extract time band data from a split Lexis object

Description

The break points of aLexis object (created by a call tosplitLexis) divide the follow-up intervals into time bandsalong a given time scale. Thebreaks function returnsthe break points, for a given time scale, and thetimeBandclassifies each row (=follow-up interval) into one of the time bands.

Usage

timeBand(lex, time.scale, type="integer")breaks(lex, time.scale)

Arguments

lex

an object of classLexis

time.scale

a character or integer vector of length 1identifying the time scale of interest

type

a string that determines how the time bands are labelled.See Details below

Details

Time bands may be labelled in various ways according to thetype argument. The permitted values of thetypeargument, and the corresponding return values are:

"integer": a numeric vector with integer codes starting from 0.
"factor": a factor (unordered) with labels "(left,right]"
"left": the left-hand limit of the time band
"middle": the midpoint of the time band
"right": the right-hand limit of the time band

Value

Thebreaks function returns a vector of break pointsfor theLexis object, or NULL if no break points have beendefined by a call tosplitLexis. ThetimeBandfunction returns a numeric vector or factor, depending on the valueof thetype argument.

Note

A newly createdLexis object has no break points defined.In this case,breaks will return NULL, andtimeBand will a vector of zeros.

Author(s)

Martyn Plummer

Examples

data(diet)diet <- cal.yr(diet)diet.lex <- Lexis(entry=list(period=doe),                   exit=list(period=dox, age=dox-dob),            exit.status=chd,                   data=diet)diet.split <- splitLexis(diet.lex, breaks=seq(40,70,5), "age" )age.left <- timeBand(diet.split, "age", "left")table(age.left)age.fact <- timeBand(diet.split, "age", "factor")table(age.fact)age.mid <- timeBand(diet.split, "age", "mid")table(age.mid)

The time scales of a Lexis object

Description

Functions to get the names and type of the time scales of aLexis object.

Usage

timeScales(x)timeSince(x)tsNA20( x, all.scales=FALSE )

Arguments

x

an object of classLexis.

all.scales

Should NAs in all timescales be replaced by 0? IfFALSE (the default) only timescales defined as time sinceentry to a state getNAs replaced by 0s

Value

timeScales returns a character vector containing the names ofthe variables inx that represent the time scales. Extractedfrom thetime.scales attribute of the object.

timeSince returns a named character vector, the names being thenames of the timescales and the content being the names of the statesto which the corresponding timescale is defined as time sinceentry. For those time scales that are not defined as such an emptystring is used. Hence, if none of the timescales are defined as timesince entry to a statetimeSince will return a vector of emptystrings.

Author(s)

Martyn Plummer, Bendix Carstensen

Transform a Lexis (or stacked.Lexis) object

Description

Modify a Lexis object.

Usage

## S3 method for class 'Lexis'factorize(x, ..., verbose = FALSE)## S3 method for class 'Lexis'Relevel(x, ref, ...)## S3 method for class 'Lexis'levels(x)## S3 method for class 'Lexis'transform(`_data`, ...)## S3 method for class 'stacked.Lexis'transform(`_data`, ...) order.Lexis(x)  orderLexis(x)   sortLexis(x)

Arguments

_data

an object of classLexis.

x

an object of classLexis.

ref

New names (or order) of the factor levels (states) forlex.Cst andlex.Xst. Can be a list, in which case some levels arecollapsed, see the documentation forRelevel. Nosanity check for the latter type of operation is undertaken.

...

Additional arguments to be passed totransform.data.frame,Relevel.factor.

verbose

Logical. Should a list of new levels be printed?

Details

The transform method forLexis objects works exactly as themethod for data frames, but keeps theLexis attributes.

factorize transforms the variableslex.Cst andlex.Xst to factors with identical sets oflevels.

Relevel does the same asRelevel.factor, but forboth the factorslex.Cst andlex.Xst inx.lex.Cst andlex.Xst must be factors with the samelevels. They can be made so byfactorize.

Ifref is an integer or character vector, the levels oflex.Cst andlex.Xst are permuted to match the order ofref.

Ifref isNULL, as when for example the argument isnot passed to the function, the returned object have levels oflex.Cst,lex.Xst (and forstacked.Lexis objectslex.Tr) shaved down to the actually occurring values; that is,empty levels are discarded.

order.Lexis returns the order of the rows in a Lexis object to sortit by ()lex.id,ts), wherets is a timescale inthe Lexis object with noNAs.orderLexis is just a synonym.

sortLexis returns the Lexis object sorted by(lex.id,ts) wherets is one of thetimeScales with noNAs.

Value

A transformedLexis object.

The functionlevels returns the names of the states (levels ofthe factorslex.Cst andlex.Xst.

Author(s)

Martyn Plummer, Bendix Carstensen

Examples

data( nickel )nic <- Lexis( data = nickel,                id = id,             entry = list(age = agein),              exit = list(age = ageout,                          cal = ageout+dob,                          tfh = ageout-age1st),# Lung cancer deaths end as 2 and other deaths as 1       exit.status = factor((icd > 0) + (icd %in% c(162,163)),                            labels = c("Alive","Dead","Lung") ) )str( nic )levels( nic )nit <- transform( nic, cumex = exposure * (agein - age1st) )str( nit )# It is still a Lexis object!summary(nic)# change order of levelsnix <- Relevel(nic, c("Alive", "Lung", "Dead"))summary(nix)# change names of levels niw <- Relevel(nix, list("Alive" = 1, "Pulm" = "Lung", "Mort" = "Dead"))summary(niw)boxes(niw, boxpos = TRUE)# combine levelsniz <- Relevel(niw, list("Alive", c("Pulm", "Mort")), coll=" \n& ")summary(niz)par( new = TRUE )boxes(niz, boxpos = TRUE)#stack Lexis objectsiw <- stack(niw)str(siw)

Analysis of a two by two table

Description

Computes the usual measures of association in a 2 by 2 table withconfidence intervals. Also produces asymtotic and exact tests. Assumesthat comparison of probability of the first column level betweenlevels of the row variable is of interest. Output requires that theinput matrix has meaningful row and column labels.

Usage

twoby2(exposure, outcome,       alpha = 0.05, print = TRUE, dec = 4,       conf.level = 1-alpha, F.lim = 10000)

Arguments

exposure

If a table the analysis is based on the first two rowsand first two columns of this. If a variable, this variable istabulated against

outcome

as the second variable

alpha

Significance level

print

Should the results be printed?

dec

Number of decimals in the printout.

conf.level

1-alpha

F.lim

If the table total exceedsF.lim, Fisher's exacttest is not computed

Value

A list with elements:

table

The analysed 2 x 2 table augmented with probabilities andconfidence intervals. The confidence intervals for the probabilitiesare computed using the normal approximation to thelog-odds. Confidence intervals for the difference of proportions arecomputed using method 10 from Newcombe, Stat.Med. 1998, 17, pp.873ff.

measures

A table of Odds-ratios and relative risk withconfidence intervals.

p.value

Exact p-value for the null hypothesis of OR=1

Author(s)

Mark Myatt. Modified by Bendix Carstensen.

Examples

Treat <- sample(c("A","B"), 50, rep=TRUE )Resp <- c("Yes","No")[1+rbinom(50,1,0.3+0.2*(Treat=="A"))]twoby2( Treat, Resp )                 twoby2( table( Treat, Resp )[,2:1] ) # Comparison the other way round

Remove Lexis attributes from a`Lexis` object.

Description

Removes the Lexis attributes, including the classLexis from a Lexis object.

Usage

unLexis(Lx)

Arguments

Lx

A Lexis object

Value

The input object with "Lexis" removed from the class attribute.

Author(s)

Bendix Carstensen

Examples

# A small bogus cohortxcoh <- structure(list( id = c("A", "B", "C"),                     birth = c("14/07/1952", "01/04/1954", "10/06/1987"),                     entry = c("04/08/1965", "08/09/1972", "23/12/1991"),                      exit = c("27/06/1997", "23/05/1995", "24/07/1998"),                      fail = c(1, 0, 1) ),                    .Names = c("id", "birth", "entry", "exit", "fail"),                 row.names = c("1", "2", "3"),                     class = "data.frame")# Convert the character dates into numerical variables (fractional years)  xcoh <- cal.yr(xcoh, format="%d/%m/%Y", wh=2:4)# xcoh <- cal.yr(xcoh, format="%d/%m/%Y", wh=2:4)# Define a Lexis object with timescales calendar time and ageLcoh <- Lexis(entry = list(per = entry ),               exit = list(per = exit,                           age = exit - birth),        exit.status = fail,               data = xcoh)summary(Lcoh)try(summary(unLexis(Lcoh)))

Cut follow-up in a`Lexis` object by event date(s) whilepreserving the original states. This is essentially across-classification of the original states and the new ones, hence the "`x`".

Description

With a multistateLexis object we might want aclassification of the follow-up according to a set of events independentof the states in the Lexis object, while keeping the the original states.This is whatxcutLexis

Usage

xcutLexis(Lx, cut, timescale = 1, sep = ".")

Arguments

Lx

ALexis object

cut

A data frame with at most one row perlex.id, columns must be:

lex.id: person-id referring tolex.id inLx
cut, a numeric; times of event om the time scaletimescale
new.state, character or factor; name of the new stateoccurring at timecut

timescale

The timescale thatcut refers to. Numeric or character.

sep

Character; string used in forming new state names by joining originalstate names inLx and state names incut$new.states

Details

The function was motivated by a follow-up through states ofmultimorbidity with state names0morb,1morb,2morb, etc. where we wanted a subdivision of each state by thepresence of T2 diabetes, resulting in states0morb,1morb,2morb,0morb.T2,1morb.T2,2morb.T2.

At most 1 transition per person is allowed incut.

Value

ALexis object with states as inLx plus states named byconcatenatingLx state names with names incut$new.state

Author(s)

Bendix Carstensen, Steno Diabetes Center Copenhagen,b@bxc.dk,http://bendixcarstensen.com

Examples

example(DMlate)levels(dmi)## show transitions between states in dmiboxes(dmi, boxpos = list(x = rep(50, 4),                         y = c(65, 35, 95, 5)),           scale.R = 1000,           show.BE = TRUE)## randomly generated intermediate events X and Yset.seed(1952)cutXY <- data.frame(lex.id = unique(dmi$lex.id),          # one row per id                       cut = runif(nid(dmi), 1995, 2008), # event dates                 new.state = sample(c("X","Y"),           # event types                                    nid(dmi),             # only 4 in 10 has an event                                    repl = TRUE))[runif(nid(dmi)) < 0.4, ]## cut at these event dates but also keep original statesLxy <- xcutLexis(dmi, cutXY)levels(Lxy)## reorder the levels for easier specification of box placesLxy <- Relevel(Lxy, as.vector(t(outer(levels(dmi),                                      c("", ".X", ".Y"),                                      paste0))))levels(Lxy)summary(Lxy)## resulting transitions between boxesboxes(Lxy, boxpos = list(x = rep(c(50, 15, 85), 4),                         y = rep(c(65,35,95,5), each = 3)),           scale.R = 1000,           show.BE = TRUE,           cex = 0.8)

Movatterモバイル変換

Epi: Functions for manipulation and statistical analysis of epidemiological data

Description

Details

hivDK: seroconversion in a cohort of Danish men

Description

Usage

Format

Source

References

Examples

The Aalen-Johansen estimator of state probabilities from amultistateLexis object.

Description

Usage

Arguments

Value

Author(s)

See Also

Examples

Births in Denmark by year and month of birth and sex

Description

Usage

Format

Details

Source

Examples

Clinical status,relapse, metastasis and death in 2982 women with breast cancer.

Description

Usage

Format

Details

Source

References

Examples

Conversion to diabetes

Description

Usage

Format

Source

Examples

Epidemiological rates for diabetes in Denmark 1996–2015

Description

Usage

Format

Details

Examples

The Danish National Diabetes Register.

Description

Usage

Format

Details

Source

References

Examples

Fits a regression model to interval censored data.

Description

Usage

Arguments

Details

Value

Author(s)

References

See Also

Examples

Fit Lee-Carter-type models for rates to arbitrarily shaped observationsof rates in a Lexis diagram.

Description

Usage

Arguments

Details

Value

Author(s)

See Also

Examples

Create a Lexis object of follow-up

Description

Usage

Arguments

Details

Value

Note

The Aalen-Johansen estimator of state probabilities from amultistate`Lexis` object.

Convert a Lexis obejct to a data set suitable for input to the`msm:msm` function.