This vignette takes you through a case study where offsets are usedduring analysis withmvabund. We recommend reading the“Analysing discrete data” chapter from theEcostatstextbook.
Under some circumstances, we may tempted to standardise or averageour abundance counts across replicates before analysing them. Often thisis to account for differences in sampling intensity (different sizedplots, different numbers of replicates), but we caution users from doingso! Standardisation/averaging counts can change the distribution of yourdata, and affect the mean-variance relationship. Instead you should useanoffset, a term specially designed to account forknown sources of variation among replicates.
Lets take a look at a case study that uses anoffsetinmvabund
Anthony wants to evaluate how well earthworms numbers were doingafter some bush regeneration efforts. Here are some worm counts frompitfall traps across each of 10 sites (where C=control, R=bush regen).Notice there were only 4 pitfall traps at one site!
| Treatment | C | R | R | R | C | R | R | R | R | R |
| Count | 0 | 3 | 1 | 3 | 1 | 2 | 12 | 1 | 18 | 0 |
| # pitfalls | 5 | 5 | 5 | 5 | 5 | 5 | 5 | 4 | 5 | 5 |
Anthony would like to model the number of worms(Haplotaxida) from thereveg dataset as afunction of treatment (bush regeneration) while accounting variation insampling effort.
Lets loadmvabund and the dataset first:
library(mvabund)library(ecostats)data(reveg)attach(reveg)skimr::skim(abund$Haplotaxida)# Great function to get an overview of the data| Name | abund$Haplotaxida |
| Number of rows | 10 |
| Number of columns | 1 |
| _______________________ | |
| Column type frequency: | |
| numeric | 1 |
| ________________________ | |
| Group variables | None |
Variable type: numeric
| skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 | hist |
|---|---|---|---|---|---|---|---|---|---|---|
| data | 0 | 1 | 4.1 | 6.01 | 0 | 1 | 1.5 | 3 | 18 | ▇▁▁▁▁ |
Now we will use themanyglm function. We specifiedfamily = "negative.binomial" as count data often follows anegative binomial distribution. We have also included an offset term todeal with the fact that sampling effort varied across sites – with 5pitfall traps at most sites but just 4 at one site. Offsets are mostcommonly used in a log-linear model, where an offset is a predictorknown to have an exactly proportional effect on the response. The offsetin Anthony’s model for islog(pitfalls). By including thisterm, our model changes from predicting mean abundance to modelling meanabundance per pitfall trap. This is what you would be trying to do ifyou averaged the counts across pitfalls prior to analysing them, butusing an offset achieves this without messing with the data and itsmean-variance relationship.
worms_offset<-manyglm(Haplotaxida~treatment+offset(log(pitfalls)),family="negative.binomial",data=abund)anova(worms_offset)#> Time elapsed: 0 hr 0 min 0 sec#> Analysis of Deviance Table#>#> Model: Haplotaxida ~ treatment + offset(log(pitfalls))#>#> Multivariate test:#> Res.Df Df.diff Dev Pr(>Dev)#> (Intercept) 9#> treatment 8 1 2.889 0.153#> Arguments: P-value calculated using 999 iterations via PIT-trap resampling.