Spatial Modeling Using Statistical LearningTechniques

AlexanderBrenning, Patrick Schratz

Source:vignettes/spatial-modeling-use-case.Rmd

spatial-modeling-use-case.Rmd

Introduction

Geospatial data scientists often make use of a variety of statisticaland machine learning techniques for spatial prediction in applicationssuch as landslide susceptibility modeling(Goetz et al. 2015) or habitat modeling(Knudby, Brenning, andLeDrew 2010). Novel and often more flexible techniquespromise improved predictive performances as they are better able torepresent nonlinear relationships or higher-order interactions betweenpredictors than less flexible linear models.

Nevertheless, this increased flexibility comes with the risk ofpossible over-fitting to the training data. Since nearby spatialobservations often tend to be more similar than distant ones,traditional random cross-validation is unable to detect thisover-fitting whenever spatial observations are close to each other (e.g.Brenning (2005)). Spatial cross-validationaddresses this by resampling the data not completely randomly, but usinglarger spatial regions. In some cases, spatial data is grouped, e.g. inremotely-sensed land use mapping grid cells belonging to the same fieldshare the same management procedures and cultivation history, makingthem more similar to each other than to pixels from other fields withthe same crop type.

This package provides a customizable toolkit for cross-validation(and bootstrap) estimation using a variety of spatial resamplingschemes. More so, this toolkit can even be extended to spatio-temporaldata or other complex data structures. This vignette will walk youthrough a simple case study, crop classification in central Chile(Peña and Brenning2015).

This vignette is based on code that Alex Brenning developed for hiscourse on ‘Environmental Statistics and GeoComputation’ that he teachesat Friedrich Schiller University Jena, Germany. Please take a look atour program and spread the word!

Data and Packages

As a case study we will carry out a supervised classificationanalysis using remotely-sensed data to predict fruit-tree crop types incentral Chile. This data set is a subsample of data from(Peña and Brenning2015).

library("sperrorest")

data("maipo", package="sperrorest")

The remote-sensing predictor variables were derived from an imagetimes series consisting of eight Landsat images acquired throughout the(southern hemisphere) growing season. The data set includes thefollowing variables:

Response

croptype: response variable (factor) with 4 levels:ground truth information

Predictors

b[12-87]: spectral data, e.g. b82 = image date #8,spectral band #2
ndvi[01-08]: Normalized Difference Vegetation Index,e.g. #8 = image date #8
ndwi[01-08]: Normalized Difference Water Index, e.g. #8= image date #8

Others

field: field identifier (grouping variable - not to beused as predictor)
utmx,utmy: x/y location; not to be usedas predictors

All but the first four variables of the data set are predictors;their names are used to construct a formula object:

predictors<-colnames(maipo)[5:ncol(maipo)]# Construct a formula:fo<-as.formula(paste("croptype ~",paste(predictors, collapse="+")))

Modeling

Here we will take a look at a few classification methods with varyingdegrees of computational complexity and flexibility. This should giveyou an idea of how different models are handled by {sperrorest},depending on the characteristics of their fitting and predictionmethods. Please refer to(James et al. 2013) for backgroundinformation on the models used here.

Linear Discriminant Analysis (LDA)

LDA is simple and fast, and often performs surprisingly well if theproblem at hand is ‘linear enough’. As a start, let’s fit a model withall predictors and using all available data:

library(MASS)fit<-lda(fo, data=maipo)

Predict the croptype with the fitted model and calculate themisclassification error rate (MER) on the training sample:

pred<-predict(fit, newdata=maipo)$classmean(pred!=maipo$croptype)

## [1] 0.0437

But remember that this result is over-optimistic because we arere-using the training sample for model evaluation. We will soon show youhow to do better with cross-validation.

We can also take a look at the confusion matrix but again, thisresult is overly optimistic:

table(pred=pred, obs=maipo$croptype)

##        obs## pred    crop1 crop2 crop3 crop4##   crop1  1294     8     4    37##   crop2    50  1054     4    44##   crop3     0     0  1935     6##   crop4    45   110    29  3093

Classification Tree

Classification and regression trees (CART) take a completelydifferent approach—they are based on yes/no questions in the predictorvariables and can be referred to as a binary partitioning technique. Fita model with all predictors and default settings:

library("rpart")

fit<-rpart(fo, data=maipo)## optional: view the classiciation tree# par(xpd = TRUE)# plot(fit)# text(fit, use.n = TRUE)

Again, predict the croptype with the fitted model and calculate theaverage MER:

pred<-predict(fit, newdata=maipo, type="class")mean(pred!=maipo$croptype)

## [1] 0.113

Here thepredict call is slightly different. Again, wecould calculate a confusion matrix.

table(pred=pred, obs=maipo$croptype)

##        obs## pred    crop1 crop2 crop3 crop4##   crop1  1204    66     0    54##   crop2    47   871    38   123##   crop3    38     8  1818    53##   crop4   100   227   116  2950

RandomForest

Bagging, bundling and random forests build upon the CART technique byfitting many trees on bootstrap resamples of the original data set(Breiman 1996)(Breiman2001)(Hothorn and Lausen 2005). They differin that random forest also samples from the predictors, and bundlingadds an ancillary classifier for improved classification. We will usethe widely used random forest in its implementation in theranger package.

library("ranger")

fit<-ranger(fo, data=maipo)fit

## Ranger result#### Call:##  ranger(fo, data = maipo)#### Type:                             Classification## Number of trees:                  500## Sample size:                      7713## Number of independent variables:  64## Mtry:                             8## Target node size:                 1## Variable importance mode:         none## Splitrule:                        gini## OOB prediction error:             0.53 %

Let’s take a look at the MER achieved on the training sample:

pred<-predict(fit, data=maipo, type="response")mean(pred$predictions!=maipo$croptype)

## [1] 0

table(pred=pred$predictions, obs=maipo$croptype)

##        obs## pred    crop1 crop2 crop3 crop4##   crop1  1389     0     0     0##   crop2     0  1172     0     0##   crop3     0     0  1972     0##   crop4     0     0     0  3180

Isn’t this amazing? All grid cells were correctly classified! Eventhe OOB (out-of-bag) estimate of the error rate is < 1%. Too good tobe true? We’ll see…

Cross-Validation Estimation of Predictive Performance

Of course we can’t take the MER on the training set too seriously—itis biased. But we’ve heard of cross-validation, in which disjointsubsets are used for model training and testing. Let’s use {sperrorest}for cross-validation.

Also, at this point we should highlight that the observations in thisdata set are pixels, and multiple grid cells belong to the same field.In a predictive situation, and when field boundaries are known (as isthe case here), we would want to predict the same class for all gridcells that belong to the same field. Here we will use a majority filter.This filter ensures that the final predicted class type of every fieldis the most often predicted croptype within one field.

Linear Discriminant Analysis (LDA)

First, we need to create a wrapper predict method for LDA forsperrorest(). This is necessary in order to accommodate themajority filter, and also because class predictions fromlda’s predict method are hidden in the$classcomponent of the returned object.

lda_predfun<-function(object,newdata,fac=NULL){library(nnet)majority<-function(x){levels(x)[which.is.max(table(x))]}majority_filter<-function(x,fac){for(levinlevels(fac)){x[fac==lev]<-majority(x[fac==lev])}x}pred<-predict(object, newdata=newdata)$classif(!is.null(fac))pred<-majority_filter(pred,newdata[,fac])return(pred)}

To ensure that custom predict-functions will work withsperrorest(), we need to wrap all custom functions in onesingle function. Otherwise,sperrorest() might fail duringexecution.

Finally, we can runsperrorest() with a non-spatialsampling setting (partition_cv()). In this example we use a‘3 repetitions - 5 folds’ setup. In reality, we recommend to use 100repetitions to reduce the influence of random partitioning.

res_lda_nsp<-sperrorest(fo,  data=maipo, coords=c("utmx","utmy"),  model_fun=lda,  pred_fun=lda_predfun,  pred_args=list(fac="field"),  smp_fun=partition_cv,  smp_args=list(repetition=1:3, nfold=5),  mode_rep="sequential",  progress=FALSE)

Note that we’re running this in sequential mode since it’s not worththe trouble parallelizing the execution of three repetitions with ashort runtime.

So what have we got:

summary(res_lda_nsp$error_rep)

##                    mean       sd   median      IQR## train_error    3.35e-02 0.000357 3.33e-02 0.000324## train_accuracy 9.67e-01 0.000357 9.67e-01 0.000324## train_events   4.69e+03 0.000000 4.69e+03 0.000000## train_count    3.09e+04 0.000000 3.09e+04 0.000000## test_error     3.93e-02 0.003323 3.97e-02 0.003306## test_accuracy  9.61e-01 0.003323 9.60e-01 0.003306## test_events    1.17e+03 0.000000 1.17e+03 0.000000## test_count     7.71e+03 0.000000 7.71e+03 0.000000

To run a spatial cross-validation at the field level, we can usepartition_factor_cv() as the sampling function. Since weare using 5 folds, we get a coarse 80/20 split of our data. 80% will beused for training, 20% for testing our trained model.

To take a look where our training and tests sets will be partitionedon each fold, we can plot them. The red colored points represent thetest set in each fold, the black colored points the training set. Notethat because we plotted over 7000 points, overplotting occurs and sincethe red crosses are plotted after the black ones, it seems visually thatway more than ~20% of red points exist than it is really the case.

resamp<-partition_factor_cv(maipo, nfold=5, repetition=1:1, fac="field")plot(resamp,maipo, coords=c("utmx","utmy"))

Subsequently, we have to specify the location of the fields(fac = "field") in the prediction arguments(pred_args) and sampling arguments (smp_args)insperrorest().

res_lda_sp<-sperrorest(fo,  data=maipo, coords=c("utmx","utmy"),  model_fun=lda,  pred_fun=lda_predfun,  pred_args=list(fac="field"),  smp_fun=partition_factor_cv,  smp_args=list(fac="field", repetition=1:3, nfold=5),  mode_rep="sequential",  benchmark=TRUE, progress=FALSE)res_lda_sp$benchmark$runtime_performance

## Time difference of 6.45 secs

summary(res_lda_sp$error_rep)

##                    mean      sd   median     IQR## train_error    3.10e-02 0.00185 3.16e-02 0.00177## train_accuracy 9.69e-01 0.00185 9.68e-01 0.00177## train_events   4.69e+03 0.00000 4.69e+03 0.00000## train_count    3.09e+04 0.00000 3.09e+04 0.00000## test_error     6.23e-02 0.00724 6.47e-02 0.00694## test_accuracy  9.38e-01 0.00724 9.35e-01 0.00694## test_events    1.17e+03 0.00000 1.17e+03 0.00000## test_count     7.71e+03 0.00000 7.71e+03 0.00000

RandomForest

In the case of Random Forest, the customizedpred_funlooks as follows; it is only required because of the majority filter,without it, we could just omit thepred_fun andpred_args arguments below.

rf_predfun<-function(object,newdata,fac=NULL){library(nnet)majority<-function(x){levels(x)[which.is.max(table(x))]}majority_filter<-function(x,fac){for(levinlevels(fac)){x[fac==lev]<-majority(x[fac==lev])}x}pred<-predict(object, data=newdata)if(!is.null(fac))pred<-majority_filter(pred$predictions,newdata[,fac])return(pred)}

res_rf_sp<-sperrorest(fo,  data=maipo, coords=c("utmx","utmy"),  model_fun=ranger,  pred_fun=rf_predfun,  pred_args=list(fac="field"),  smp_fun=partition_factor_cv,  smp_args=list(    fac="field",    repetition=1:3, nfold=5),  mode_rep="sequential",  benchmark=TRUE, progress=2)

summary(res_rf_sp$error_rep)

##                    mean      sd   median    IQR## train_error    0.00e+00 0.00000 0.00e+00 0.0000## train_accuracy 1.00e+00 0.00000 1.00e+00 0.0000## train_events   4.69e+03 0.00000 4.69e+03 0.0000## train_count    3.09e+04 0.00000 3.09e+04 0.0000## test_error     9.01e-02 0.00943 9.09e-02 0.0094## test_accuracy  9.10e-01 0.00943 9.09e-01 0.0094## test_events    1.17e+03 0.00000 1.17e+03 0.0000## test_count     7.71e+03 0.00000 7.71e+03 0.0000

summary(res_rf_sp$error_rep)["test_accuracy",]

##               mean      sd median    IQR## test_accuracy 0.91 0.00943  0.909 0.0094

What a surprise! {ranger}‘s classification is not that good afterall, if we acknowledge that in ’real life’ we wouldn’t be makingpredictions in situations where the class membership of other grid cellsin the same field is known in the training stage. So spatial dependencedoes matter.

Usage Advice

Given all the different sampling functions and the required custompredict functions (e.g. rf_predfun()) in this example, youmight be a little confused which function to use for your use case. Ifyou want to do a “normal”, i.e. non-spatialcross-validation we recommend to usepartition_cv() assmp_fun insperrorest(). If you want to perform aspatialcross-validation (and you do not have a grouping structure likefields in this example),partition_kmeans() takes care ofspatial partitioning. In most cases you can simply use the genericpredict() method for your model (= skip this argument insperrorest()). Check our “custom model and predictfunctions” vignette for more information on cases where adjustments areneeded.

For further questions/issues, please open an issue in theGithub repo.

References

Breiman, Leo. 1996.“Bagging Predictors.”MachineLearning 24 (2): 123–40.https://doi.org/10.1007/bf00058655.

———. 2001.“Random Forests.”Machine Learning 45(1): 5–32.https://doi.org/10.1023/a:1010933404324.

Brenning, A. 2005.“Spatial Prediction Models for LandslideHazards: Review, Comparison and Evaluation.”Natural Hazardsand Earth System Science 5 (6): 853–62.https://doi.org/10.5194/nhess-5-853-2005.

Goetz, J. N., A. Brenning, H. Petschko, and P. Leopold. 2015.“Evaluating Machine Learning and Statistical Prediction Techniquesfor Landslide Susceptibility Modeling.”Computers& Geosciences 81 (August): 1–11.https://doi.org/10.1016/j.cageo.2015.04.007.

Hothorn, Torsten, and Berthold Lausen. 2005.“Bundling Classifiersby Bagging Trees.”Computational Statistics& Data Analysis 49 (4): 1068–78.https://doi.org/10.1016/j.csda.2004.06.019.

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani.2013.An Introduction to Statistical Learning. Springer NewYork.https://doi.org/10.1007/978-1-4614-7138-7.

Knudby, Anders, Alexander Brenning, and Ellsworth LeDrew. 2010.“New Approaches to Modelling FishhabitatRelationships.”Ecological Modelling 221 (3): 503–11.https://doi.org/10.1016/j.ecolmodel.2009.11.008.

Peña, M. A., and A. Brenning. 2015.“Assessing Fruit-Tree CropClassification from Landsat-8 Time Series for the Maipo Valley,Chile.”Remote Sensing of Environment 171 (December):234–44.https://doi.org/10.1016/j.rse.2015.10.029.

Movatterモバイル変換

Spatial Modeling Using Statistical LearningTechniques

AlexanderBrenning, Patrick Schratz

Introduction

Data and Packages

Modeling

Linear Discriminant Analysis (LDA)

Classification Tree

RandomForest

Cross-Validation Estimation of Predictive Performance

Linear Discriminant Analysis (LDA)

RandomForest

Usage Advice

References