This vignette demonstrates the versatility and utility of thepredict.modeler() function when applied to a fitted model.This function is designed to handle models of classmodelerand provides several prediction types, outlined as follows:
Each type of prediction includes corresponding standard errors, whichare calculated using the delta method.
library(flexFitR)library(dplyr)library(kableExtra)library(ggpubr)library(purrr)data(dt_potato)head(dt_potato)|>kable()| Trial | Plot | Row | Range | gid | DAP | Canopy | GLI |
|---|---|---|---|---|---|---|---|
| HARS20_chips | 1 | 1 | 1 | W17037-24 | 0 | 0.000 | 0.0000000 |
| HARS20_chips | 1 | 1 | 1 | W17037-24 | 29 | 0.000 | 0.0027216 |
| HARS20_chips | 1 | 1 | 1 | W17037-24 | 36 | 0.670 | -0.0008966 |
| HARS20_chips | 1 | 1 | 1 | W17037-24 | 42 | 15.114 | 0.0322547 |
| HARS20_chips | 1 | 1 | 1 | W17037-24 | 56 | 75.424 | 0.2326896 |
| HARS20_chips | 1 | 1 | 1 | W17037-24 | 76 | 99.811 | 0.3345619 |
To illustrate the functionality ofpredict(), we use apotato dataset to fit logistic models of the form:
\[f(t) = \frac{L}{1 + e^{-k(t -t_0)}}\]
For simplicity, we’ll focus on just two plots from the dataset (plot40 and plot 166) out of the total 196 plots available. After fitting themodel, we’ll take a closer look at the parameter estimates, visualizethe fitted curves, and start making predictions.
mod_1<- dt_potato|>modeler(x = DAP,y = Canopy,grp = Plot,fn ="fun_logistic",parameters =c(L =100,k =4,t0 =40),subset = plots )print(mod_1)#>#> Call:#> Canopy ~ fun_logistic(DAP, L, k, t0)#>#> Residuals (`Standardized`):#> Min. 1st Qu. Median Mean 3rd Qu. Max.#> -1.5910 -0.6313 0.0326 -0.1693 0.4665 1.2007#>#> Optimization Results `head()`:#> uid L k t0 sse#> 40 99.8 0.199 47.7 37.4#> 166 99.0 0.262 42.8 23.2#>#> Metrics:#> Groups Timing Convergence Iterations#> 2 0.4459 secs 100% 712 (id)To make point predictions, we use thepredict() functionand specify the\(x\) value(s) forwhich we want to compute\(\hat{f}(x)\). By default, the predictiontype is set to"point", so it is unnecessary to explicitlyincludetype = "point".
| uid | fn_name | x_new | predicted.value | std.error |
|---|---|---|---|---|
| 40 | fun_logistic | 55 | 80.85282 | 2.912364 |
| 166 | fun_logistic | 55 | 95.06255 | 1.625402 |
A great way to visualize this is by plotting the fitted curve andoverlaying the predicted points.
mod_1|>plot(id = plots,type =3)+color_palette(palette ="jco")+geom_point(data = points,aes(x = x_new,y = predicted.value),shape =8)You’ll also notice that predictions come with standard errors, whichcan be adjusted using these_interval argument to choosebetween"confidence" or"prediction"intervals, depending on the type of intervals you want to generate(sometimes referred to as narrow vs. wide intervals).
| uid | fn_name | x_new | predicted.value | std.error |
|---|---|---|---|---|
| 40 | fun_logistic | 55 | 80.85282 | 3.994805 |
| 166 | fun_logistic | 55 | 95.06255 | 2.697961 |
The area under the fitted curve is another common calculation,especially when trying to summarize the overall behavior of a functionover a specific range. To compute the AUC, settype = "auc"and provide the interval of interest in thex argument. Youcan also specify the number of subintervals for the trapezoidal ruleapproximation usingn_points (e.g.,n_points = 500 provides a high-resolution approximationhere).
\[ \text{Area} = \int_{0}^{T} \frac{L}{1 +e^{-k(t - t_0)}} \, dt \]
| uid | fn_name | x_min | x_max | predicted.value | std.error |
|---|---|---|---|---|---|
| 40 | fun_logistic | 0 | 108 | 6018.833 | 89.57456 |
| 166 | fun_logistic | 0 | 108 | 6450.840 | 69.71235 |
In many cases, interest lies not in the parameters themselves but infunctions of these parameters. By using theformulaargument, we can compute user-defined functions of the estimatedparameters along with their standard errors. No additional arguments arerequired for this functionality.
| uid | fn_name | formula | predicted.value | std.error |
|---|---|---|---|---|
| 40 | fun_logistic | k/L * 100 | 0.1990757 | 0.0173468 |
| 166 | fun_logistic | k/L * 100 | 0.2644616 | 0.0312821 |
| uid | fn_name | formula | predicted.value | std.error |
|---|---|---|---|---|
| 40 | fun_logistic | (k * L)/4 | 4.959301 | 0.3761384 |
| 166 | fun_logistic | (k * L)/4 | 6.480310 | 0.7140117 |
For those interested in the derivatives of the fitted function,predict.modeler() provides tools to compute both the first(\(f'(x)\)) and second order (\(f''(x)\)) derivatives at specifiedpoints or over intervals. While derivatives for logistic functions arestraightforward to compute analytically, this is not true for many otherfunctions. To address this,predict() employs a numericalapproximation using the “Richardson” method.
For the logistic function, the first derivative has the followingform:
\[ f'(t) = \frac{k L e^{-k(t -t_0)}}{\left(1 + e^{-k(t - t_0)}\right)^2} \]
And the second derivative the following:
\[f''(t) = \frac{k^2 L e^{-k(t -t_0)} \left(1 - e^{-k(t - t_0)}\right)}{\left(1 + e^{-k(t -t_0)}\right)^3}\] Here the first derivative tells us the growthrate, while the second derivative reveals how the growth rate isaccelerating or decelerating.
To compute the first derivative, settype = "fd" in thepredict() function and provide points or intervals in thex argument. In case we just want to visualize the firstderivative we can use theplot() function.
The\(x\)-coordinate where themaximum occurs can be found programmatically, and the correspondingvalue of\(\hat{f}(x)\) can be computedusing point predictions as follows:
interval<-seq(0,100,by =0.1)points_fd<- mod_1|>predict(x = interval,type ="fd")|>group_by(uid)|>summarise(max_fd =max(predicted.value),argmax_fd = x_new[which.max(predicted.value)] )points_fd|>kable()| uid | max_fd | argmax_fd |
|---|---|---|
| 40 | 4.959300 | 47.7 |
| 166 | 6.480117 | 42.8 |
mod_1|>plot(id = plots,type =3)+color_palette(palette ="jco")+geom_vline(data = points_fd,aes(xintercept = argmax_fd),linetype =2)+geom_label(data = points_fd,aes(x = argmax_fd,y =0,label = argmax_fd))+facet_wrap(~uid)+theme(legend.position ="none")points_fd$y_hat<-sapply(X = plots,FUN = \(i) { mod_1|>predict(x = points_fd[points_fd$uid== i,"argmax_fd"],id = i)|>pull(predicted.value) })points_fd|>kable()| uid | max_fd | argmax_fd | y_hat |
|---|---|---|---|
| 40 | 4.959300 | 47.7 | 49.88846 |
| 166 | 6.480117 | 42.8 | 49.23131 |
mod_1|>plot(id = plots,type =3)+color_palette(palette ="jco")+geom_point(data = points_fd,aes(x = argmax_fd,y = y_hat),shape =8)Similarly, the second derivative can be calculated by settingtype = "sd". This derivative shows how the growth rateitself is changing, helping to determine when growth starts to slow downor speed up.
We can also identify where the second derivative reaches its maximumand minimum values, and plot these changes for a deeper understanding ofthe growth dynamics.
points_sd<- mod_1|>predict(x = interval,type ="sd")|>group_by(uid)|>summarise(max_sd =max(predicted.value),argmax_sd = x_new[which.max(predicted.value)],min_sd =min(predicted.value),argmin_sd = x_new[which.min(predicted.value)] )points_sd|>kable()| uid | max_sd | argmax_sd | min_sd | argmin_sd |
|---|---|---|---|---|
| 40 | 0.3793269 | 41.1 | -0.3793232 | 54.3 |
| 166 | 0.6530569 | 37.8 | -0.6530419 | 47.9 |
mod_1|>plot(id = plots,type =3)+color_palette(palette ="jco")+geom_vline(data = points_sd,aes(xintercept = argmax_sd),linetype =2)+geom_vline(data = points_sd,aes(xintercept = argmin_sd),linetype =2)+facet_wrap(~uid)+theme(legend.position ="none")Thepredict.modeler() function, as part of the modelingtoolkit, offers a range of useful predictions that can be tailored tovarious needs—whether it’s making point estimates, exploring the areaunder a curve, or analyzing derivatives. While the examples presentedhere showcase the flexibility and power of the function, they are justthe beginning. Every dataset and research question brings its own uniquechallenges, and we hope this vignette demonstrates howpredict.modeler() can help address those.