Movatterモバイル変換

Accelerated failure time model

From Wikipedia, the free encyclopedia

Parametric model in survival analysis

In thestatistical area ofsurvival analysis, anaccelerated failure time model (AFT model) is aparametric model that provides an alternative to the commonly usedproportional hazards models. Whereas a proportional hazards model assumes that the effect of acovariate is to multiply thehazard by some constant, an AFT model assumes that the effect of a covariate is to accelerate or decelerate the life course of a disease by some constant. There is strong basic science evidence fromC. elegans experiments by Stroustrup et al.^[1] indicating that AFT models are the correct model for biological survival processes.

Model specification

[edit]

In full generality, the accelerated failure time model can be specified as^[2]

\lambda (t|\theta )=\theta \lambda _{0}(\theta t)

where $\theta$ denotes the joint effect of covariates, typically $\theta =\exp(-[\beta _{1}X_{1}+\cdots +\beta _{p}X_{p}])$ . (Specifying the regression coefficients with a negative sign implies that high values of the covariatesincrease the survival time, but this is merely a sign convention; without a negative sign, they increase the hazard.)

This is satisfied if theprobability density function of the event is taken to be $f(t|\theta )=\theta f_{0}(\theta t)$ ; it then follows for thesurvival function that $S(t|\theta )=S_{0}(\theta t)$ . From this it is easy^{[citation needed]} to see that the moderated life time $T {\displaystyle T}$ is distributed such that $T\theta$ and the unmoderated life time $T_{0}$ have the same distribution. Consequently, $\log(T)$ can be written as

\log(T)=-\log(\theta )+\log(T\theta ):=-\log(\theta )+\epsilon

where the last term is distributed as $\log(T_{0})$ , i.e., independently of $\theta$ . This reduces the accelerated failure time model toregression analysis (typically alinear model) where $-\log(\theta )$ represents the fixed effects, and $\epsilon$ represents the noise. Different distributions of $\epsilon$ imply different distributions of $T_{0}$ , i.e., different baseline distributions of the survival time. Typically, in survival-analytic contexts, many of the observations are censored: we only know that $T_{i}>t_{i}$ , not $T_{i}=t_{i}$ . In fact, the former case represents survival, while the later case represents an event/death/censoring during the follow-up. These right-censored observations can pose technical challenges for estimating the model, if the distribution of $T_{0}$ is unusual.

The interpretation of $\theta$ in accelerated failure time models is straightforward: $\theta =2$ means that everything in the relevant life history of an individual happens twice as fast. For example, if the model concerns the development of a tumor, it means that all of the pre-stages progress twice as fast as for the unexposed individual, implying that the expected time until a clinical disease is 0.5 of the baseline time. However, this does not mean that the hazard function $\lambda (t|\theta )$ is always twice as high - that would be theproportional hazards model.

Statistical issues

[edit]

Unlike proportional hazards models, in whichCox's semi-parametric proportional hazards model is more widely used than parametric models, AFT models are predominantly fully parametric i.e. aprobability distribution is specified for $\log(T_{0})$ . (Buckley and James^[3] proposed a semi-parametric AFT but its use is relatively uncommon in applied research; in a 1992 paper, Wei^[4] pointed out that the Buckley–James model has no theoretical justification and lacks robustness, and reviewed alternatives.) This can be a problem, if a degree of realistic detail is required for modelling the distribution of a baseline lifetime. Hence, technical developments in this direction would be highly desirable.

When a frailty term is incorporated in the survival model, the regression parameter estimates from AFT models are robust to omittedcovariates, unlike proportional hazards models. They are also less affected by the choice of probability distribution for the frailty term.^[5]^[6]

The results of AFT models are easily interpreted.^[7] For example, the results of aclinical trial with mortality as the endpoint could be interpreted as a certain percentage increase in futurelife expectancy on the new treatment compared to the control. So a patient could be informed that he would be expected to live (say) 15% longer if he took the new treatment.Hazard ratios can prove harder to explain in layman's terms.

Distributions used in AFT models

[edit]

Thelog-logistic distribution provides the most commonly used AFT model^{[citation needed]}. Unlike theWeibull distribution, it can exhibit a non-monotonic hazard function which increases at early times and decreases at later times. It is somewhat similar in shape to thelog-normal distribution but it has heavier tails. The log-logisticcumulative distribution function has a simpleclosed form, which becomes important computationally when fitting data withcensoring. For the censored observations one needs the survival function, which is the complement of the cumulative distribution function, i.e. one needs to be able to evaluate $S(t|\theta )=1-F(t|\theta )$ .

TheWeibull distribution (including theexponential distribution as a special case) can be parameterised as either a proportional hazards model or an AFT model, and is the only family of distributions to have this property. The results of fitting a Weibull model can therefore be interpreted in either framework. However, the biological applicability of this model may be limited by the fact that the hazard function is monotonic, i.e. either decreasing or increasing.

Any distribution on amultiplicatively closed group, such as thepositive real numbers, is suitable for an AFT model. Other distributions include thelog-normal,gamma,hypertabastic,Gompertz distribution, andinverse Gaussian distributions, although they are less popular than the log-logistic, partly as their cumulative distribution functions do not have a closed form. Finally, thegeneralized gamma distribution is a three-parameter distribution that includes theWeibull,log-normal andgamma distributions as special cases.

References

[edit]

^Stroustrup, Nicholas (16 January 2016)."The temporal scaling of Caenorhabditis elegans ageing".Nature.530 (7588):103–107.Bibcode:2016Natur.530..103S.doi:10.1038/nature16550.PMC 4828198.PMID 26814965.
^Kalbfleisch & Prentice (2002).The Statistical Analysis of Failure Time Data (2nd ed.). Hoboken, NJ: Wiley Series in Probability and Statistics.
^Buckley, Jonathan; James, Ian (1979), "Linear regression with censored data",Biometrika,66 (3):429–436,doi:10.1093/biomet/66.3.429,JSTOR 2335161
^Wei, L. J. (1992). "The accelerated failure time model: A useful alternative to the cox regression model in survival analysis".Statistics in Medicine.11 (14–15):1871–1879.doi:10.1002/sim.4780111409.PMID 1480879.
^Lambert, Philippe; Collett, Dave; Kimber, Alan; Johnson, Rachel (2004),"Parametric accelerated failure time models with random effects and an application to kidney transplant survival",Statistics in Medicine,23 (20):3177–3192,doi:10.1002/sim.1876,hdl:2268/24489,PMID 15449337
^Keiding, N.; Andersen, P. K.; Klein, J. P. (1997). "The Role of Frailty Models and Accelerated Failure Time Models in Describing Heterogeneity Due to Omitted Covariates".Statistics in Medicine.16 (1–3):215–224.doi:10.1002/(SICI)1097-0258(19970130)16:2<215::AID-SIM481>3.0.CO;2-J.PMID 9004393.
^Kay, Richard; Kinnersley, Nelson (2002),"On the use of the accelerated failure time model as an alternative to the proportional hazards model in the treatment of time to event data: A case study in influenza",Drug Information Journal,36 (3):571–579,doi:10.1177/009286150203600312

Bradburn, MJ; Clark, TG; Love, SB; Altman, DG (2003), "Survival Analysis Part II: Multivariate data analysis - an introduction to concepts and methods",British Journal of Cancer,89 (3):431–436,doi:10.1038/sj.bjc.6601119,PMC 2394368,PMID 12888808
Hougaard, Philip (1999), "Fundamentals of Survival Data",Biometrics,55 (1):13–22,doi:10.1111/j.0006-341X.1999.00013.x,PMID 11318147
Collett, D. (2003),Modelling Survival Data in Medical Research (2nd ed.), CRC press,ISBN 978-1-58488-325-8
Cox, David Roxbee; Oakes, D. (1984),Analysis of Survival Data, CRC Press,ISBN 978-0-412-24490-2
Marubini, Ettore; Valsecchi, Maria Grazia (1995),Analysing Survival Data from Clinical Trials and Observational Studies, Wiley,ISBN 978-0-470-09341-2
Martinussen, Torben; Scheike, Thomas (2006), Dynamic Regression Models for Survival Data, Springer,ISBN 0-387-20274-9
Bagdonavicius, Vilijandas; Nikulin, Mikhail (2002), Accelerated Life Models. Modeling and Statistical Analysis, Chapman&Hall/CRC,ISBN 1-58488-186-0

Statistics

Descriptive statistics

Continuous data

Center	Mean Arithmetic Arithmetic-Geometric Contraharmonic Cubic Generalized/power Geometric Harmonic Heronian Heinz Lehmer Median Mode
Dispersion	Average absolute deviation Coefficient of variation Interquartile range Percentile Range Standard deviation Variance
Shape	Central limit theorem Moments Kurtosis L-moments Skewness

Count data

Index of dispersion

Summary tables

Dependence

Graphics

Data collection

Study design	Effect size Missing data Optimal design Population Replication Sample size determination Statistic Statistical power
Survey methodology	Sampling Cluster Stratified Opinion poll Questionnaire Standard error
Controlled experiments	Blocking Factorial experiment Interaction Random assignment Randomized controlled trial Randomized experiment Scientific control
Adaptive designs	Adaptive clinical trial Stochastic approximation Up-and-down designs
Observational studies	Cohort study Cross-sectional study Natural experiment Quasi-experiment

Statistical inference

Statistical theory

Frequentist inference

Point estimation	Estimating equations Maximum likelihood Method of moments M-estimator Minimum distance Unbiased estimators Mean-unbiased minimum-variance Rao–Blackwellization Lehmann–Scheffé theorem Median unbiased Plug-in
Interval estimation	Confidence interval Pivot Likelihood interval Prediction interval Tolerance interval Resampling Bootstrap Jackknife
Testing hypotheses	1- & 2-tails Power Uniformly most powerful test Permutation test Randomization test Multiple comparisons
Parametric tests	Likelihood-ratio Score/Lagrange multiplier Wald

Specific tests

Z-test(normal) Student'st-test F-test
Goodness of fit	Chi-squared G-test Kolmogorov–Smirnov Anderson–Darling Lilliefors Jarque–Bera Normality(Shapiro–Wilk) Likelihood-ratio test Model selection Cross validation AIC BIC
Rank statistics	Sign Sample median Signed rank(Wilcoxon) Hodges–Lehmann estimator Rank sum(Mann–Whitney) Nonparametric anova 1-way(Kruskal–Wallis) 2-way(Friedman) Ordered alternative(Jonckheere–Terpstra) Van der Waerden test

Bayesian inference

Correlation	Pearson product-moment Partial correlation Confounding variable Coefficient of determination
Regression analysis (see alsoTemplate:Least squares and regression analysis	Errors and residuals Regression validation Mixed effects models Simultaneous equations models Multivariate adaptive regression splines (MARS)
Linear regression	Simple linear regression Ordinary least squares General linear model Bayesian regression
Non-standard predictors	Nonlinear regression Nonparametric Semiparametric Isotonic Robust Homoscedasticity and Heteroscedasticity
Generalized linear model	Exponential families Logistic(Bernoulli) / Binomial / Poisson regressions
Partition of variance	Analysis of variance (ANOVA, anova) Analysis of covariance Multivariate ANOVA Degrees of freedom

Categorical / multivariate / time-series / survival analysis

Categorical

Multivariate

Time-series

General	Decomposition Trend Stationarity Seasonal adjustment Exponential smoothing Cointegration Structural break Granger causality
Specific tests	Dickey–Fuller Johansen Q-statistic(Ljung–Box) Durbin–Watson Breusch–Godfrey
Time domain	Autocorrelation (ACF) partial (PACF) Cross-correlation (XCF) ARMA model ARIMA model(Box–Jenkins) Autoregressive conditional heteroskedasticity (ARCH) Vector autoregression (VAR) (Autoregressive model (AR))
Frequency domain	Spectral density estimation Fourier analysis Least-squares spectral analysis Wavelet Whittle likelihood

Survival

Survival function	Kaplan–Meier estimator (product limit) Proportional hazards models Accelerated failure time (AFT) model First hitting time
Hazard function	Nelson–Aalen estimator
Test	Log-rank test

Applications

Biostatistics	Bioinformatics Clinical trials / studies Epidemiology Medical statistics
Engineering statistics	Chemometrics Methods engineering Probabilistic design Process / quality control Reliability System identification
Social statistics	Actuarial science Census Crime statistics Demography Econometrics Jurimetrics National accounts Official statistics Population statistics Psychometrics
Spatial statistics	Cartography Environmental statistics Geographic information system Geostatistics Kriging