Movatterモバイル変換

A Simple Non-Linear Model

We begin with a simple example using simulated data.

b<-c(2,0.75)x<-rnorm(100)y<-rnorm(100,mean = b[1]*exp(b[2]* x))dat1<-data.frame(x, y)

As stated above, we cannot use a generalized linear model to estimate\(b\) so we go ahead an specify anon-linear model.

prior1<-prior(normal(1,2),nlpar ="b1")+prior(normal(0,2),nlpar ="b2")fit1<-brm(bf(y~ b1*exp(b2* x), b1+ b2~1,nl =TRUE),data = dat1,prior = prior1)

When looking at the above code, the first thing that becomes obviousis that we changed theformula syntax to display thenon-linear formula including predictors (i.e.,x) andparameters (i.e.,b1 andb2) wrapped in a calltobf. This stands in contrast to classicalR formulas, where only predictors are given andparameters are implicit. The argumentb1 + b2 ~ 1 servestwo purposes. First, it provides information, which variables informula are parameters, and second, it specifies the linearpredictor terms for each parameter. In fact, we should think ofnon-linear parameters as placeholders for linear predictor terms ratherthan as parameters themselves (see also the following examples). In thepresent case, we have no further variables to predictb1andb2 and thus we just fit intercepts that represent ourestimates of\(b_1\) and\(b_2\) in the model equation above. Theformulab1 + b2 ~ 1 is a short form ofb1 ~ 1, b2 ~ 1 that can be used if multiple non-linearparameters share the same formula. Settingnl = TRUE tellsbrms that the formula should be treated asnon-linear.

In contrast to generalized linear models, priors on population-levelparameters (i.e., ‘fixed effects’) are often mandatory to identify anon-linear model. Thus,brms requires the user toexplicitly specify these priors. In the present example, we used anormal(1, 2) prior on (the population-level intercept of)b1, while we used anormal(0, 2) prior on (thepopulation-level intercept of)b2. Setting priors is anon-trivial task in all kinds of models, especially in non-linearmodels, so you should always invest some time to think of appropriatepriors. Quite often, you may be forced to change your priors afterfitting a non-linear model for the first time, when you observedifferent MCMC chains converging to different posterior regions. This isa clear sign of an identification problem and one solution is to setstronger (i.e., more narrow) priors.

To obtain summaries of the fitted model, we apply

summary(fit1)

 Family: gaussian   Links: mu = identity Formula: y ~ b1 * exp(b2 * x)          b1 ~ 1         b2 ~ 1   Data: dat1 (Number of observations: 100)   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;         total post-warmup draws = 4000Regression Coefficients:             Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSb1_Intercept     1.90      0.09     1.72     2.09 1.00     1374     1976b2_Intercept     0.75      0.03     0.70     0.80 1.00     1474     1858Further Distributional Parameters:      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSsigma     0.94      0.07     0.82     1.08 1.00     2098     1749Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESSand Tail_ESS are effective sample size measures, and Rhat is the potentialscale reduction factor on split chains (at convergence, Rhat = 1).

plot(fit1)

plot(conditional_effects(fit1),points =TRUE)

Thesummary method reveals that we were able to recoverthe true parameter values pretty nicely. According to theplot method, our MCMC chains have converged well and to thesame posterior. Theconditional_effects method visualizesthe model-implied (non-linear) regression line.

We might be also interested in comparing our non-linear model to aclassical linear model.

fit2<-brm(y~ x,data = dat1)

summary(fit2)

 Family: gaussian   Links: mu = identity Formula: y ~ x    Data: dat1 (Number of observations: 100)   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;         total post-warmup draws = 4000Regression Coefficients:          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSIntercept     2.49      0.16     2.17     2.80 1.00     4469     3040x             2.30      0.16     1.99     2.61 1.00     4784     3164Further Distributional Parameters:      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSsigma     1.53      0.11     1.33     1.76 1.00     3720     2914Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESSand Tail_ESS are effective sample size measures, and Rhat is the potentialscale reduction factor on split chains (at convergence, Rhat = 1).

To investigate and compare model fit, we can apply graphicalposterior predictive checks, which make use of thebayesplot package on the backend.

pp_check(fit1)

pp_check(fit2)

We can also easily compare model fit using leave-one-outcross-validation.

loo(fit1, fit2)

Output of model 'fit1':Computed from 4000 by 100 log-likelihood matrix.         Estimate   SEelpd_loo   -136.8  7.2p_loo         2.8  0.8looic       273.7 14.4------MCSE of elpd_loo is NA.MCSE and ESS estimates assume MCMC draws (r_eff in [0.3, 1.1]).Pareto k diagnostic values:                         Count Pct.    Min. ESS(-Inf, 0.7]   (good)     99    99.0%   1142       (0.7, 1]   (bad)       1     1.0%   <NA>       (1, Inf)   (very bad)  0     0.0%   <NA>    See help('pareto-k-diagnostic') for details.Output of model 'fit2':Computed from 4000 by 100 log-likelihood matrix.         Estimate   SEelpd_loo   -189.4 17.2p_loo         9.0  5.9looic       378.7 34.4------MCSE of elpd_loo is NA.MCSE and ESS estimates assume MCMC draws (r_eff in [0.6, 1.3]).Pareto k diagnostic values:                         Count Pct.    Min. ESS(-Inf, 0.7]   (good)     99    99.0%   595        (0.7, 1]   (bad)       0     0.0%   <NA>       (1, Inf)   (very bad)  1     1.0%   <NA>    See help('pareto-k-diagnostic') for details.Model comparisons:     elpd_diff se_difffit1   0.0       0.0  fit2 -52.5      16.9

Since smallerLOOIC values indicate better model fit, itis immediately evident that the non-linear model fits the data better,which is of course not too surprising since we simulated the data fromexactly that model.

A Real-World Non-Linear model

On his blog, Markus Gesmann predicts the growth of cumulativeinsurance loss payments over time, originated from different originyears (seehttps://www.magesblog.com/post/2015-11-03-loss-developments-via-growth-curves-and/).We will use a slightly simplified version of his model for demonstrationpurposes here. It looks as follows:

\[cum_{AY, dev} \sim N(\mu_{AY, dev},\sigma)\]\[\mu_{AY, dev} = ult_{AY}\left(1 - \exp\left(- \left( \frac{dev}{\theta} \right)^\omega \right)\right)\]

The cumulative insurance payments\(cum\) will grow over time, and we modelthis dependency using the variable\(dev\). Further,\(ult_{AY}\) is the (to be estimated)ultimate loss of accident each year. It constitutes a non-linearparameter in our framework along with the parameters\(\theta\) and\(\omega\), which are responsible for thegrowth of the cumulative loss and are assumed to be the same acrossyears. The data is already shipped with brms.

data(loss)head(loss)

    AY dev      cum premium1 1991   6  357.848   100002 1991  18 1124.788   100003 1991  30 1735.330   100004 1991  42 2182.708   100005 1991  54 2745.596   100006 1991  66 3319.994   10000

and translate the proposed model into a non-linearbrms model.

fit_loss<-brm(bf(cum~ ult* (1-exp(-(dev/theta)^omega)),     ult~1+ (1|AY), omega~1, theta~1,nl =TRUE),data = loss,family =gaussian(),prior =c(prior(normal(5000,1000),nlpar ="ult"),prior(normal(1,2),nlpar ="omega"),prior(normal(45,10),nlpar ="theta")  ),control =list(adapt_delta =0.9))

We estimate a group-level effect of accident year (variableAY) for the ultimate lossult. This also showsnicely how a non-linear parameter is actually a placeholder for a linearpredictor, which in case ofult, contains only an varyingintercept over year. Again, priors on population-level effects arerequired and, for the present model, are actually mandatory to ensureidentifiability. We summarize the model using well known methods.

summary(fit_loss)

 Family: gaussian   Links: mu = identity Formula: cum ~ ult * (1 - exp(-(dev/theta)^omega))          ult ~ 1 + (1 | AY)         omega ~ 1         theta ~ 1   Data: loss (Number of observations: 55)   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;         total post-warmup draws = 4000Multilevel Hyperparameters:~AY (Number of levels: 10)                   Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSsd(ult_Intercept)   747.55    240.07   423.46  1381.79 1.00     1090     1644Regression Coefficients:                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSult_Intercept    5283.29    298.18  4708.31  5916.86 1.00     1110     1746omega_Intercept     1.34      0.05     1.24     1.43 1.00     2436     2558theta_Intercept    46.24      2.25    42.50    51.14 1.00     2272     2002Further Distributional Parameters:      Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSsigma   140.21     16.06   113.42   176.58 1.00     3289     2676Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESSand Tail_ESS are effective sample size measures, and Rhat is the potentialscale reduction factor on split chains (at convergence, Rhat = 1).

plot(fit_loss,N =3,ask =FALSE)

conditional_effects(fit_loss)

Next, we show marginal effects separately for each year.

conditions<-data.frame(AY =unique(loss$AY))rownames(conditions)<-unique(loss$AY)me_loss<-conditional_effects(  fit_loss,conditions = conditions,re_formula =NULL,method ="predict")plot(me_loss,ncol =5,points =TRUE)

It is evident that there is some variation in cumulative loss acrossaccident years, for instance due to natural disasters happening only incertain years. Further, we see that the uncertainty in the predictedcumulative loss is larger for later years with fewer available datapoints. For a more detailed discussion of this data set, see Section 4.5in Gesmann & Morris (2020).

Advanced Item-Response Models

As a third example, we want to show how to model more advanceditem-response models using the non-linear model framework ofbrms. For simplicity, suppose we have a single forcedchoice item with three alternatives of which only one is correct. Ourresponse variable is whether a person answers the item correctly (1) ornot (0). Person are assumed to vary in their ability to answer the itemcorrectly. However, every person has a 33% chance of getting the itemright just by guessing. We thus simulate some data to reflect thissituation.

inv_logit<-function(x)1/ (1+exp(-x))ability<-rnorm(300)p<-0.33+0.67*inv_logit(ability)answer<-ifelse(runif(300,0,1)< p,1,0)dat_ir<-data.frame(ability, answer)

The most basic item-response model is equivalent to a simple logisticregression model.

fit_ir1<-brm(answer~ ability,data = dat_ir,family =bernoulli())

However, this model completely ignores the guessing probability andwill thus likely come to biased estimates and predictions.

summary(fit_ir1)

 Family: bernoulli   Links: mu = logit Formula: answer ~ ability    Data: dat_ir (Number of observations: 300)   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;         total post-warmup draws = 4000Regression Coefficients:          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSIntercept     0.74      0.14     0.47     1.01 1.00     2751     2374ability       0.86      0.16     0.57     1.19 1.00     2477     2175Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESSand Tail_ESS are effective sample size measures, and Rhat is the potentialscale reduction factor on split chains (at convergence, Rhat = 1).

plot(conditional_effects(fit_ir1),points =TRUE)

A more sophisticated approach incorporating the guessing probabilitylooks as follows:

fit_ir2<-brm(bf(answer~0.33+0.67*inv_logit(eta),     eta~ ability,nl =TRUE),data = dat_ir,family =bernoulli("identity"),prior =prior(normal(0,5),nlpar ="eta"))

It is very important to set the link function of thebernoulli family toidentity or else we willapply two link functions. This is because our non-linear predictor termalready contains the desired link function(0.33 + 0.67 * inv_logit), but thebernoullifamily applies the defaultlogit link on top of it. Thiswill of course lead to strange and uninterpretable results. Thus, pleasemake sure that you set the link function toidentity,whenever your non-linear predictor term already contains the desiredlink function.

summary(fit_ir2)

 Family: bernoulli   Links: mu = identity Formula: answer ~ 0.33 + 0.67 * inv_logit(eta)          eta ~ ability   Data: dat_ir (Number of observations: 300)   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;         total post-warmup draws = 4000Regression Coefficients:              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSeta_Intercept    -0.10      0.20    -0.52     0.28 1.00     2120     1779eta_ability       1.39      0.29     0.86     1.99 1.00     2478     2301Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESSand Tail_ESS are effective sample size measures, and Rhat is the potentialscale reduction factor on split chains (at convergence, Rhat = 1).

plot(conditional_effects(fit_ir2),points =TRUE)

Comparing model fit via leave-one-out cross-validation

loo(fit_ir1, fit_ir2)

Output of model 'fit_ir1':Computed from 4000 by 300 log-likelihood matrix.         Estimate   SEelpd_loo   -174.7  7.5p_loo         2.2  0.2looic       349.3 15.0------MCSE of elpd_loo is 0.0.MCSE and ESS estimates assume MCMC draws (r_eff in [0.5, 0.9]).All Pareto k estimates are good (k < 0.7).See help('pareto-k-diagnostic') for details.Output of model 'fit_ir2':Computed from 4000 by 300 log-likelihood matrix.         Estimate   SEelpd_loo   -173.2  7.6p_loo         1.9  0.2looic       346.5 15.1------MCSE of elpd_loo is 0.0.MCSE and ESS estimates assume MCMC draws (r_eff in [0.4, 1.0]).All Pareto k estimates are good (k < 0.7).See help('pareto-k-diagnostic') for details.Model comparisons:        elpd_diff se_difffit_ir2  0.0       0.0   fit_ir1 -1.4       1.5

shows that both model fit the data equally well, but remember thatpredictions of the first model might still be misleading as they maywell be below the guessing probability for low ability values. Now,suppose that we don’t know the guessing probability and want to estimateit from the data. This can easily be done changing the previous modeljust a bit.

fit_ir3<-brm(bf(answer~ guess+ (1- guess)*inv_logit(eta),    eta~0+ ability, guess~1,nl =TRUE),data = dat_ir,family =bernoulli("identity"),prior =c(prior(normal(0,5),nlpar ="eta"),prior(beta(1,1),nlpar ="guess",lb =0,ub =1)  ))

Here, we model the guessing probability as a non-linear parametermaking sure that it cannot exceed the interval\([0, 1]\). We did not estimate an interceptforeta, as this will lead to a bias in the estimatedguessing parameter (try it out; this is an excellent example of howcareful one has to be in non-linear models).

summary(fit_ir3)

 Family: bernoulli   Links: mu = identity Formula: answer ~ guess + (1 - guess) * inv_logit(eta)          eta ~ 0 + ability         guess ~ 1   Data: dat_ir (Number of observations: 300)   Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;         total post-warmup draws = 4000Regression Coefficients:                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESSeta_ability         1.32      0.26     0.85     1.87 1.00     2163     1850guess_Intercept     0.31      0.05     0.21     0.41 1.00     2725     2217Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESSand Tail_ESS are effective sample size measures, and Rhat is the potentialscale reduction factor on split chains (at convergence, Rhat = 1).

plot(fit_ir3)

plot(conditional_effects(fit_ir3),points =TRUE)

The results show that we are able to recover the simulated modelparameters with this non-linear model. Of course, real item-responsedata have multiple items so that accounting for item and personvariability (e.g., using a multilevel model with varying intercepts)becomes necessary as we have multiple observations per item and person.Luckily, this can all be done within the non-linear framework ofbrms and I hope that this vignette serves as a goodstarting point.

Movatterモバイル変換

Estimating Non-Linear Models with brms

Paul Bürkner

2025-09-09

Introduction

A Simple Non-Linear Model

A Real-World Non-Linear model

Advanced Item-Response Models

References