Movatterモバイル変換


[0]ホーム

URL:


Chapter 12 – Visualising many responses –Exercise solutions and Code Boxes

David Warton

2022-08-24

Exercise 12.1: Flower size and shape

How do these three species of Iris differ in terms of flower sizeand shape?

What sort of graph should Edgar produce to visualise how speciesdiffer in flower size and shape?

He has more than two response variables so I guess as well asplotting each response one-at-a-time he could try plotting themsimultaneously using ordination (factor analysis, for example). Ineither plot he would want to label different species differently to seevariation across species.

Exercise 12.2: Bush regeneration and invertebrate counts

Is there evidence of a change in invertebrate communities due tobush regeneration efforts?

What sort of graph should Anthony produce to visualise theeffects of bush regeneration on invertebrate communities?

Anthony has a bunch of response variables so I guess as well asplotting each response one-at-a-time he could try plotting themsimultaneously using ordination. He has counts so will need somethingthat works for non-Gaussian responses like generalised latent variablemodels. In either plot he would want to label different speciesdifferently to see variation between revegetated and control plots.

Code Box 12.1: Plotting the bush regeneration data of Exercise 12.1usingmvabund.

library(mvabund)library(ecostats)data(reveg)reveg$abundMV=mvabund(reveg$abund)#to treat data as multivariateplot(abundMV~treatment,data=reveg)#> Overlapping points were shifted along the y-axis to make them visible.#>#>  PIPING TO 2nd MVFACTOR#> Only the variables Collembola, Acarina, Formicidae, Coleoptera, Diptera, Amphipoda, Isopoda, Larvae, Hemiptera, Soleolifera, Hymenoptera, Araneae were included in the plot#> (the variables with highest total abundance).
plot of chunk code12.1
plot of chunk code12.1

Can you see any taxa that seem to be associated with bushregeneration?

There seem to be less invertebrates in control plots forCollembola,Acarina,Coloeptera,Amphipoda and maybe a few Orders.

Code Box 12.2: A PCA of Edgar’sIris data

data("iris")pc=princomp(iris[,1:4],cor=TRUE)pc#> Call:#> princomp(x = iris[, 1:4], cor = TRUE)#>#> Standard deviations:#>    Comp.1    Comp.2    Comp.3    Comp.4#> 1.7083611 0.9560494 0.3830886 0.1439265#>#>  4  variables and  150 observations.loadings(pc)#>#> Loadings:#>              Comp.1 Comp.2 Comp.3 Comp.4#> Sepal.Length  0.521  0.377  0.720  0.261#> Sepal.Width  -0.269  0.923 -0.244 -0.124#> Petal.Length  0.580        -0.142 -0.801#> Petal.Width   0.565        -0.634  0.524#>#>                Comp.1 Comp.2 Comp.3 Comp.4#> SS loadings      1.00   1.00   1.00   1.00#> Proportion Var   0.25   0.25   0.25   0.25#> Cumulative Var   0.25   0.50   0.75   1.00biplot( pc,xlabs=rep("\u00B0",dim(iris)[1]) )
plot of chunk code12.2
plot of chunk code12.2

Code Box 12.3: Factor analysis of the Iris data

library(psych)fa_iris<-fa(iris[,1:4],nfactors=2,fm="ml",rotate="varimax")loadings(fa_iris)#>#> Loadings:#>              ML1    ML2#> Sepal.Length  0.997#> Sepal.Width  -0.115 -0.665#> Petal.Length  0.871  0.486#> Petal.Width   0.818  0.514#>#>                  ML1   ML2#> SS loadings    2.436 0.942#> Proportion Var 0.609 0.236#> Cumulative Var 0.609 0.844

How do results compare to the principal componentsanalysis?

They look awfully similar. The second factor looks a littledifferent, and is flipped around the other way (so big vlues meannarrow sepals), but it also has postive loadings for petalvariables. So this could be interpreted as a measure of how large petalsare relative to sepal width: big scores for large petals with narrowsepals, low scores for small petals with wide sepals. Recall thatpreviously, the second PC was pretty much just a measure of how widesepals were, now it is relative to petal size.

Code Box 12.4: Assumption checking for a factor analysis of the Irisdata

par(mfrow=c(2,2),mar=c(3,3,2,1),mgp=c(1.75,0.75,0))for(iVarin1:4){  irisIvar= iris[,iVar]plotenvelope(lm(irisIvar~fa_iris$scores),which=1,col=iris$Species,main=print(names(iris)[iVar]),n.sim=99)}#> [1] "Sepal.Length"#> [1] "Sepal.Width"#> [1] "Petal.Length"#> [1] "Petal.Width"
plot of chunk code12.4
plot of chunk code12.4
plot(iris[,1:4],col=iris$Species)
plot of chunk code12.4matrix
plot of chunk code12.4matrix

(Note thatplotenvelope was run with just59 iterations, to speed up computation time.)

Exercise 12.4: A factor analysis for Anthony’s data(?)

Load Anthony’s revegetation data (stored asreveg intheecostats package) and do a factor analysis (with twofactors).

data(reveg)library(psych)fa_reveg<-try(fa(reveg$abund,nfactors=2,fm="ml",rotate="varimax"))#> Warning in cor.smooth(R): Matrix was not positive definite, smoothing was done#> In smc, smcs < 0 were set to .0#> Warning in cor.smooth(R): Matrix was not positive definite, smoothing was done#> In smc, smcs < 0 were set to .0#> Warning in log(e): NaNs produced#> Error in optim(start, FAfn, FAgr, method = "L-BFGS-B", lower = 0.005,  :#>   L-BFGS-B needs finite values of 'fn'

You might not be able to get a solution when using maximumlikelihood estimation (fm=“ml”), in which case, try fitting usingwithout specifying the fm argument (which tries to minimiseresiduals).

fa_reveg<-fa(reveg$abund,nfactors=2)#> Warning in cor.smooth(R): Matrix was not positive definite, smoothing was done#> In smc, smcs < 0 were set to .0#> Warning in cor.smooth(R): Matrix was not positive definite, smoothing was done#> In smc, smcs < 0 were set to .0#> Warning in cor.smooth(R): Matrix was not positive definite, smoothing was done#> In smc, smcs < 0 were set to .0#> Loading required namespace: GPArotation#> Warning in cor.smooth(r): Matrix was not positive definite, smoothing was done#> Warning in fa.stats(r = r, f = f, phi = phi, n.obs = n.obs, np.obs = np.obs, : The estimated#> weights for the factor scores are probably incorrect. Try a different factor score estimation#> method.#> In factor.scores, the correlation matrix is singular, an approximation is used#> Warning in cor.smooth(r): Matrix was not positive definite, smoothing was done

This returned some concerning warnings but did fit the model :)

Check some assumptions, by fitting a linear model to some of theresponse variables, as a function of factor scores.

par(mfrow=c(3,3),mar=c(3,3,2,1),mgp=c(1.75,0.75,0))for(iVarin1:9){  y=reveg$abund[,iVar]plotenvelope(lm(y~fa_reveg$scores),which=1,main=names(reveg$abund)[iVar],n.sim=99)}
plot of chunk ex12.4RFplots
plot of chunk ex12.4RFplots
par(mfrow=c(3,3),mar=c(3,3,2,1),mgp=c(1.75,0.75,0))for(iVarin1:9){  y=reveg$abund[,iVar]plotenvelope(lm(y~fa_reveg$scores),which=2,main=names(reveg$abund)[iVar],n.sim=99)}
plot of chunk ex12.4QQplots
plot of chunk ex12.4QQplots

Can you see any issues with factor analysis assumptions?

With only ten observations it is always hard to see issues, andresidual plots have huge error bars on them! But we have points outsidetheir normal quantile simulation envelopes in several of these plots,and most have a suggestion of right-skew (the occasional large value).Note also that fitted values go below zero for many species which ismost concerning considering we are modelling counts!

Code Box 12.5: Choosing the number of factors for the iris data

plot(fa_iris$values,type="l")
plot of chunk code12.5scree
plot of chunk code12.5scree
nFactors=3# to compare models with up to 3 factorsBICs=rep(NA,nFactors)# define the vector that BIC values go innames(BICs)=1:nFactors# name its values according to #factorsfor(iFactorsin1:nFactors) {  fa_iris<-fa(iris[,1:4],nfactors=iFactors,fm="ml",rotate="varimax")  BICs[iFactors]= fa_iris$objective-log(fa_iris$nh)* fa_iris$dof}BICs#>         1         2         3#> -9.436629  5.171006 15.031906

How many factors are supported by the data?

One, this has the smallest BIC.

Code Box 12.6: A generalised latent variable model for Anthony’srevegetation data

data(reveg)library(gllvm)#>#> Attaching package: 'gllvm'#> The following objects are masked from 'package:VGAM':#>#>     AICc, nobs, predict, vcov#> The following objects are masked from 'package:stats4':#>#>     nobs, vcov#> The following object is masked from 'package:vegan':#>#>     ordiplot#> The following object is masked from 'package:mvabund':#>#>     coefplot#> The following objects are masked from 'package:stats':#>#>     nobs, predict, simulate, vcovreveg_LVM=gllvm(reveg$abund,num.lv=2,family="negative.binomial",trace=TRUE,jitter.var=0.2)logLik(reveg_LVM)#> 'log Lik.' -675.5768 (df=95)

Repeating this several times usually returns an answer of-689.3, so we can be confident this is (close to) themaximum likelihood solution. To get a biplot of this solution:

ordiplot(reveg_LVM,col=as.numeric(reveg$treatment),biplot=TRUE,ind.spp=12)
plot of chunk code12.6bi
plot of chunk code12.6bi
par(mfrow=c(1,3),mar=c(3,3,1,1),mgp=c(1.75,0.75,0))plot(reveg_LVM,which=c(1,2,5))
plot of chunk code12.6plot
plot of chunk code12.6plot

Exercise 12.5: Checking analysis decisions for Anthony’srevegetation data

In Code Box 12.6, a negative binomial model was fitted, using twolatent variables. Are two latent variables needed, or should we usemore, or less? Fit a few models varying the number of latent variables.Which model fits the data best, according to BIC?

reveg_LVM1=gllvm(reveg$abund,num.lv=1,family="negative.binomial",trace=TRUE,jitter.var=0.2)reveg_LVM2=gllvm(reveg$abund,num.lv=2,family="negative.binomial",trace=TRUE,jitter.var=0.2)reveg_LVM3=gllvm(reveg$abund,num.lv=3,family="negative.binomial",trace=TRUE,jitter.var=0.2)reveg_LVM4=gllvm(reveg$abund,num.lv=4,family="negative.binomial",trace=TRUE,jitter.var=0.2)reveg_LVM5=gllvm(reveg$abund,num.lv=5,family="negative.binomial",trace=TRUE,jitter.var=0.2)BIC(reveg_LVM1,reveg_LVM2,reveg_LVM3,reveg_LVM4,reveg_LVM5)#>             df      BIC#> reveg_LVM1  72 1557.821#> reveg_LVM2  95 1578.730#> reveg_LVM3 117 1575.276#> reveg_LVM4 138 1587.444#> reveg_LVM5 158 1633.496

For me two latent variable models was the winner!

Fit a Poisson model to the data and check assumptions. Are thereany signs of overdispersion?

I’ll go with two latent variable models, on account of this lookingthe best in the above.

reveg_LVM1=gllvm(reveg$abund,num.lv=2,family="poisson",trace=TRUE,jitter.var=0.2)par(mfrow=c(1,3))plot(reveg_LVM1,which=c(1,2,5))
plot of chunk ex12.5
plot of chunk ex12.5

Wow this does not look good! There is a clear fan-shape in theresidual vs fits plot, which also shows up as an increasing trend in thescale-location plot. Points on the normal quantile plot are well outsidebounds on both sides, frequently falling below -5 or above 5 (when wewould expect most values between -3 and 3). These are all strong signsof overdispersion.

Code Box 12.7: A non-metric multi-dimensional scaling ordination ofAnthony’s data

library(vegan)ord_mds=metaMDS(reveg$abund,trace=0)#> Square root transformation#> Wisconsin double standardizationplot(ord_mds$points,pch=as.numeric(reveg$treatment),col=reveg$treatment)
plot of chunk code12.7
plot of chunk code12.7

Exercise 12.6: MDS ordinations of coral data

library(mvabund)data(tikus)tikusAbund= tikus$abund[1:20,]# for 1981 and 1983 data onlytikusAbund= tikusAbund[,apply(tikusAbund,2,sum)>0]# remove zerotons

Construct an MDS plot of the data, using the Bray-Curtis distance(default), and colour-code symbols by year of sampling.

tikus_mds=metaMDS(tikusAbund,trace=0)#> Square root transformation#> Wisconsin double standardizationplot(tikus_mds$points,pch=as.numeric(tikus$x$time),col=tikus$x$time)
plot of chunk ex12.6ordination
plot of chunk ex12.6ordination

Does this plot agree with the Warwick et al. (1990)interpretation? [Warwick et al. (1990) used this dataset and MDSordinations to argue that stress increases dispersion in coralcommunities]

Yes it does, 1981 (before El Niño disturbance) the points are closetogether in the middle of the ordination, 1983 (post disturbance) theyare spread out around the same point but way further apart, suggesting achange in dispersion.

Construct another MDS plot using the Euclidean distance onlog(y+1)-transformed data.

tikus_mdsEuc=metaMDS(log(tikusAbund+1),distance="euclidean",trace=0)plot(tikus_mdsEuc$points,pch=as.numeric(tikus$x$time),col=tikus$x$time)
plot of chunk ex12.6EucOrdination
plot of chunk ex12.6EucOrdination

Does this plot agree with the Warwick et al. (1990)interpretation?

Nope – this says the opposite, with much lower dispersion postdisturbance. It is suggestive of a location effect as well, that is, achange in mean abundance not just variability.

Use theplot.mvabund function to plot each coralresponse variable as a function of time. What is the main pattern thatyou see?

tikusMV=mvabund(tikusAbund)plot(tikusMV~tikus$x$time[1:20])#> Overlapping points were shifted along the y-axis to make them visible.#>#>  PIPING TO 2nd MVFACTOR#> Only the variables Heliopora.coerulea, Montipora.digitata, Favites.abdita, Favites.chinensis, Platygyra.daedalea, Montipora.foliosa, Pocillopora.damicornis, Acropora.cytherea, Acropora.hyacinthus, Acropora.formosa, Pocillopora.verrucosa, Acropora.pulchra were included in the plot#> (the variables with highest total abundance).
plot of chunk ex12.6plot
plot of chunk ex12.6plot

Convert the data into presence-absence and use the gllvm packageto construct an ordination

tikusPA= tikusAbundtikusPA[tikusPA>1]=1tikus_LVM=gllvm(tikusPA,num.lv=2,family="binomial",trace=TRUE,jitter.var=0.2)ordiplot.gllvm(tikus_LVM,s.col=as.numeric(tikus$x$time),biplot=TRUE,ind.spp=12)
plot of chunk ex12.6PA
plot of chunk ex12.6PA

Do assumptions appear reasonable? How would you interpret thisplot?

par(mfrow=c(1,3))plot(tikus_LVM,which=c(1,2,5))
plot of chunk ex12.6assumptions
plot of chunk ex12.6assumptions

Code Box 12.8: Studying each observation separately for the Irisdata

by(iris, iris$Species,function(dat){apply(dat[,1:4],2,mean) } )#> iris$Species: setosa#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width#>        5.006        3.428        1.462        0.246#> ----------------------------------------------------------------------#> iris$Species: versicolor#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width#>        5.936        2.770        4.260        1.326#> ----------------------------------------------------------------------#> iris$Species: virginica#> Sepal.Length  Sepal.Width Petal.Length  Petal.Width#>        6.588        2.974        5.552        2.026par(mfrow=c(2,2),mar=c(3,3,1,1),mgp=c(1.75,0.75,0))plot(Sepal.Length~Species,data=iris,xlab="")plot(Sepal.Width~Species,data=iris,xlab="")plot(Petal.Length~Species,data=iris,xlab="")plot(Petal.Width~Species,data=iris,xlab="")
plot of chunk code12.8
plot of chunk code12.8

#remove this chunk once gllvm has been updated on CRAN:


[8]ページ先頭

©2009-2025 Movatter.jp