spind
is a package dedicated to removing the spectre of spatial autocorrelation in your species distribution models (hereafter referred to as SDMs). It contains many of the tools you need to calculate probabilities of occurence and/or abundances, assess model performance, and conduct multimodel inference for 2-D gridded datasets using methods that are robust to spatial autocorrelation.
The theory underlying the use of GEEs, WRMs, and many of the other tools in this package is covered elsewhere in the literature, and for the purposes of this vignette, we assume that you have already read those papers. If you haven’t, citations are included in the footnotes of this vignette as well as in the documentation of each function. We also assume that you have a working knowledge of how to use R. This vignette will focus on demonstrating how to utilize this package to create an SDM and assess its accuracy. Along the way, we will use a couple different data sets to examine how these functions work and investigate how one might use them to create a robust SDM.
This package utilizes the functions already written for GEEs from the packages gee
2 and geepack
3 and adapts them for easy use in the context of an SDM. Let’s start with a fairly simple GEE using the simulated musdata
data set included in the package.
data(musdata)
data(carlinadata)
# Examine the structure to familiarize yourself with the data
?musdata
head(musdata)
?carlinadata
head(carlinadata)
# Next, fit a simple GEE and view the output
coords<-musdata[ ,4:5]
mgee<-GEE(musculus ~ pollution + exposure, family="poisson", data=musdata,
coord=coords, corstr="fixed", plot=TRUE, scale.fix=FALSE)
summary(mgee, printAutoCorPars=TRUE)
#>
#> Call:
#> GEE(formula = musculus ~ pollution + exposure, family = "poisson",
#> data = musdata, coord = coords, corstr = "fixed", plot = TRUE,
#> scale.fix = FALSE)
#> ---
#> Coefficients:
#> Estimate Std.Err z value Pr(>|z|)
#> (Intercept) -1.90475 1.31091 -1.4530 0.1462252
#> pollution 3.36216 0.91416 3.6779 0.0002352 ***
#> exposure -1.46348 0.88010 -1.6629 0.0963410 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ---
#> QIC: 1139.159
#> ---
#> Autocorrelation of GLM residuals
#> [1] 0.685338504 0.509680590 0.363021118 0.247398654 0.144726020
#> [6] 0.084220961 0.050228656 0.022369044 -0.001985639 -0.027296083
#>
#> Autocorrelation of GEE residuals
#> [1] -0.001277974 -0.004261554 0.045280260 0.022738750 0.005821352
#> [6] 0.004289166 0.008311357 0.003437398 0.001030847 -0.010359040
#> ---
#> Autocorrelation parameters from fixed model
#> [1] "a=alpha^(d^v) , alpha=0.685 , v=1.093"
predictions<-predict(mgee, newdata=musdata)
As you can see, this package includes S3 methods for summary
and predict
. These are useful in evaluating model fit and autocorrelation of residuals compared to a non-spatial model (in this case, a GLM with the same family as the GEE). Additionally, the plot
argument in GEE
can be used to visually inspect the autocorrelation of the residuals from each regression. Note that a QIC (Quasi-information criterion) score is reported as opposed to AIC. This is calculated based on the method described in Hardin & Hilbe4,5 and is implemented using the function qic.calc
.
Note that trying to fit GEEs with corstr="fixed"
to large data sets (i.e. number of observations is approximately sqrt(.Machine$integer.max)
) will result in errors, as the resulting variance-covariance matrices will be too large to be handled in R (you may well run into problems before this point due to memory allocation issues). This is where fitting clustered models can come in handy, as they work with smaller, more manageable matrices. These can be specified by changing the corstr
to either "quadratic"
or "exchangeable"
.
Next, we’ll examine the other main model that is introduced in this package - the Wavelet Revised Model. These are implemented using wavelet transforms from the waveslim
package.7 Let’s start with a fairly simple WRM using the same musdata
data set as above.
mwrm<-WRM(musculus ~ pollution + exposure, "poisson", musdata,
coord=coords, level=1, plot=TRUE)
summary(mwrm)
#>
#> Call:
#> WRM(formula = musculus ~ pollution + exposure, family = "poisson",
#> data = musdata, coord = coords, level = 1, plot = TRUE)
#>
#> Pearson Residuals:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -1.6140000 -0.3057000 0.0047510 -0.0003873 0.3039000 3.0620000
#> ---
#> Coefficients:
#> Estimate Std.Err z value Pr(>|z|)
#> (Intercept) -1.9360 1.9177 -1.0095 0.312717
#> pollution 3.1841 1.2251 2.5991 0.009348 **
#> exposure -1.2286 1.5063 -0.8156 0.414723
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ---
#> Number of observations n: 400 , n.eff: 300 , AIC: 1110.845
#>
#> Number of iterations: 7
#> ---
#> Autocorrelation of glm.residuals
#> [1] 0.685338504 0.509680590 0.363021118 0.247398654 0.144726020
#> [6] 0.084220961 0.050228656 0.022369044 -0.001985639 -0.027296083
#> Autocorrelation of wavelet.residuals
#> [1] 0.024855393 -0.086311686 0.007820356 0.024501828 -0.016578686
#> [6] 0.002798656 -0.002977017 -0.004611334 0.018150352 -0.008727321
predictions<-predict(mwrm, newdata=musdata)
Let’s try padding with mean values.
# Padding with mean values
padded.mwrm<-WRM(musculus ~ pollution + exposure, "poisson", musdata,
coord=coords, level=1, pad=list(padform=1), plot=TRUE)
summary(padded.mwrm)
#>
#> Call:
#> WRM(formula = musculus ~ pollution + exposure, family = "poisson",
#> data = musdata, coord = coords, level = 1, pad = list(padform = 1),
#> plot = TRUE)
#>
#> Pearson Residuals:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -1.6140000 -0.3057000 0.0047510 -0.0003873 0.3039000 3.0620000
#> ---
#> Coefficients:
#> Estimate Std.Err z value Pr(>|z|)
#> (Intercept) -1.9360 1.9177 -1.0095 0.312717
#> pollution 3.1841 1.2251 2.5991 0.009348 **
#> exposure -1.2286 1.5063 -0.8156 0.414723
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ---
#> Number of observations n: 400 , n.eff: 300 , AIC: 1110.845
#>
#> Number of iterations: 7
#> ---
#> Autocorrelation of glm.residuals
#> [1] 0.685338504 0.509680590 0.363021118 0.247398654 0.144726020
#> [6] 0.084220961 0.050228656 0.022369044 -0.001985639 -0.027296083
#> Autocorrelation of wavelet.residuals
#> [1] 0.024855393 -0.086311686 0.007820356 0.024501828 -0.016578686
#> [6] 0.002798656 -0.002977017 -0.004611334 0.018150352 -0.008727321
padded.predictions<-predict(padded.mwrm, newdata=musdata)
WRM
has many of the same features as GEE
. Setting plot=TRUE
allows you to visually examine the autocorrelation of residuals from a GLM of the same error family as your WRM. S3 methods for predict
and summary
allow you to examine outputs from the model using the same code as you might use for a GLM. However, note that this reports an AIC score, rather than a QIC score as in the GEE.
WRM
has a number of other model-specific functions that you may find useful in diagnosing model fit and understanding your results. For example, you might want to plot the variance or covariance of each of your variables as a function of level
. The covar.plot
function allows you to visually examine the wavelet relationships from your model. However, we are going to switch to the carlinadata
data set now.
coords<-carlinadata[ ,4:5]
covar.plot(carlina.horrida ~ aridity + land.use - 1,
data=carlinadata, coord=coords, wavelet="d4",
wtrafo='modwt', plot='covar')
#> $result
#> [,1] [,2] [,3] [,4] [,5]
#> carlina.horrida-aridity 0.0368 0.0450 0.0623 0.0780 0.0466
#> carlina.horrida-land.use 0.4782 0.1191 0.0332 0.0126 0.0055
covar.plot(carlina.horrida ~ aridity + land.use - 1,
carlinadata, coord=coords, wavelet="d4",
wtrafo='modwt', plot='var')
#> $result
#> [,1] [,2] [,3] [,4] [,5]
#> carlina.horrida 0.7235 0.1792 0.0628 0.0242 0.0093
#> aridity 0.0691 0.1025 0.2028 0.3588 0.2657
#> land.use 0.7556 0.1851 0.0420 0.0119 0.0044
You may also want to view the smooth components of your wavelets at different scale
levels. For this, we offer the upscale
function, which allows you to visually examine your matrices for a number of different levels of scale
, which controls the resolution of the grid cells in your data set.8 It also offers the option to adjust padding settings so you can see how that influences your smooth components as well. The default is mean values of your input vector, but it can be easily switched using the pad
argument, which works the same way as in the other WRM
functions. A quick example below using carlinadata
data set.
upscale(carlinadata$land.use, carlinadata$x, carlinadata$y)
spind
provides a couple of frameworks for conducting multi-model inference analyses and some helper functions that we hope will make your life easier when examining the results. The first that we’ll examine here is the step.spind
function, which implements step-wise model selection. The process is loosely based on MASS::stepAIC
and stats::step
, but is specific to classes GEE
and WRM
. For GEEs, step.spind
uses models with the lowest QIC scores to determine what the next step will be. For WRMs, you have the option of using AIC or AICc (AIC corrected for small sample sizes) using the logical AICc
argument.
Currently, the function only supports backwards model selection. In other words, you have to start with all of the variables in your model formula and remove them in a stepwise fashion. We hope to add forward model selection methods shortly. Additionally, step.spind
is written to always respect the hierarchy of variables in the model and the user cannot override this currently. For example, step.spind
would not remove race
while retaining I(race^2)
. We may change that in the future, but it will remain like this at least until the next major release. Currently, it recognizes polynomial variables by matching variable names located inside of I(var^some_power)
and interaction terms by searching for var1:var2
in the model terms. If you want to use a higher order polynomial variable and are not worried about the variable hierarchy, you can create a separate variable (i.e. race_2
) and use that in the model.
We’ll go through an example of step.spind
using a GEE on the birthwt
data set in the MASS
package below. The data in birthwt
aren’t at all related to SDMs and are not spatially structured, but we hope that in using this data set, we will demonstrate how this function can work with many types of data sets.
# For demonstration only. We are artificially imposing a grid structure
# on data that is not actually spatial data
library(MASS)
data(birthwt)
x<-rep(1:14,14)
y<-as.integer(gl(14,14))
coords<-cbind(x[-(190:196)],y[-(190:196)])
formula<-formula(low ~ age + lwt + race + smoke + ftv + bwt + I(race^2))
mgee<-GEE(formula, family="gaussian", data=birthwt,
coord=coords, corstr="fixed",scale.fix=TRUE)
mwrm<-WRM(formula, family="gaussian", data=birthwt,
coord=coords, level=1)
ssgee<-step.spind(mgee, birthwt)
#> Iteration: 1
#> Single term deletions
#> Deleted Term: age
#> --------------------
#> Deleted.Vars QIC Quasi.Lik
#> 1 <none> 112.4177 -52.68206
#> 2 age 111.9314 -52.67027
#> 3 lwt 112.3100 -52.74725
#> 4 race 112.1267 -52.68782
#> 5 smoke 112.1349 -52.68652
#> 6 ftv 112.1632 -52.70359
#> 7 bwt 299.1973 -121.62394
#> 8 I(race^2) 112.1329 -52.69176
#>
#> Iteration: 2
#> Single term deletions
#> Deleted Term: race
#> --------------------
#> Deleted.Vars QIC Quasi.Lik
#> 1 <none> 111.9314 -52.67027
#> 2 lwt 111.8146 -52.72965
#> 3 race 111.6474 -52.67038
#> 4 smoke 111.6566 -52.67100
#> 5 ftv 111.7364 -52.72503
#> 6 bwt 299.0924 -121.56174
#> 7 I(race^2) 111.6567 -52.67607
#>
#> -----
#> Model hierarchy violated by last removal
#> New deleted term: smoke
#> Previously deleted term added back into model
#> -----
#> Iteration: 3
#> Single term deletions
#> Deleted Term: race
#> --------------------
#> Deleted.Vars QIC Quasi.Lik
#> 1 <none> 111.6566 -52.67100
#> 2 lwt 111.5311 -52.72809
#> 3 race 111.3802 -52.67366
#> 4 ftv 111.4711 -52.72790
#> 5 bwt 300.0954 -123.16568
#> 6 I(race^2) 111.3877 -52.67890
#>
#> -----
#> Model hierarchy violated by last removal
#> New deleted term: I(race^2)
#> Previously deleted term added back into model
#> -----
#> Iteration: 4
#> Single term deletions
#> Deleted Term: ftv
#> --------------------
#> Deleted.Vars QIC Quasi.Lik
#> 1 <none> 111.3877 -52.67890
#> 2 lwt 111.2494 -52.72805
#> 3 race 111.3122 -52.76311
#> 4 ftv 111.2088 -52.73630
#> 5 bwt 298.6147 -123.33017
#>
#> Iteration: 5
#> Single term deletions
#> Deleted Term: lwt
#> --------------------
#> Deleted.Vars QIC Quasi.Lik
#> 1 <none> 111.2088 -52.73630
#> 2 lwt 111.0717 -52.78793
#> 3 race 111.1415 -52.82335
#> 4 bwt 298.6351 -123.35038
#>
#> Iteration: 6
#> Single term deletions
#> Deleted Term: race
#> --------------------
#> Deleted.Vars QIC Quasi.Lik
#> 1 <none> 111.0717 -52.78793
#> 2 race 110.9656 -52.86072
#> 3 bwt 295.8817 -123.12477
#>
#> Iteration: 7
#> Single term deletions
#> Deleted Term: <none>
#> --------------------
#> Deleted.Vars QIC Quasi.Lik
#> 1 <none> 110.9656 -52.86072
#> 2 bwt 296.2879 -123.98743
#>
#>
#> ---------------
#> Best model found:
#> low ~ bwt
sswrm<-step.spind(mwrm, birthwt, AICc=TRUE)
#> Iteration: 1
#> Single term deletions
#> Deleted Term: race
#> --------------------
#> Deleted.Vars LogLik AIC AICc
#> 1 <none> -36.27411 90.54822 91.55381
#> 2 age -36.27965 88.55930 89.35930
#> 3 lwt -36.30071 88.60143 89.40143
#> 4 race -36.03376 88.06752 88.86752
#> 5 smoke -36.13543 88.27087 89.07087
#> 6 ftv -36.25766 88.51532 89.31532
#> 7 bwt -88.97789 193.95579 194.75579
#> 8 I(race^2) -36.03879 88.07758 88.87758
#>
#> -----
#> Model hierarchy violated by last removal
#> New deleted term: I(race^2)
#> Previously deleted term added back into model
#> -----
#> Iteration: 2
#> Single term deletions
#> Deleted Term: smoke
#> --------------------
#> Deleted.Vars LogLik AIC AICc
#> 1 <none> -36.03879 88.07758 88.87758
#> 2 age -36.04158 86.08317 86.70195
#> 3 lwt -36.09072 86.18144 86.80022
#> 4 race -36.11776 86.23552 86.85430
#> 5 smoke -35.90087 85.80174 86.42052
#> 6 ftv -36.02003 86.04007 86.65885
#> 7 bwt -88.71971 191.43941 192.05820
#>
#> Iteration: 3
#> Single term deletions
#> Deleted Term: ftv
#> --------------------
#> Deleted.Vars LogLik AIC AICc
#> 1 <none> -35.90087 85.80174 86.42052
#> 2 age -35.90797 83.81593 84.27747
#> 3 lwt -35.95683 83.91366 84.37520
#> 4 race -35.97398 83.94797 84.40950
#> 5 ftv -35.88997 83.77995 84.24149
#> 6 bwt -90.79731 193.59462 194.05616
#>
#> Iteration: 4
#> Single term deletions
#> Deleted Term: age
#> --------------------
#> Deleted.Vars LogLik AIC AICc
#> 1 <none> -35.88997 83.77995 84.24149
#> 2 age -35.89084 81.78169 82.10956
#> 3 lwt -35.94335 81.88670 82.21457
#> 4 race -35.96301 81.92602 82.25389
#> 5 bwt -90.81649 191.63298 191.96085
#>
#> Iteration: 5
#> Single term deletions
#> Deleted Term: lwt
#> --------------------
#> Deleted.Vars LogLik AIC AICc
#> 1 <none> -35.89084 81.78169 82.10956
#> 2 lwt -35.94877 79.89754 80.11494
#> 3 race -35.95486 79.90972 80.12711
#> 4 bwt -91.14883 190.29766 190.51505
#>
#> Iteration: 6
#> Single term deletions
#> Deleted Term: race
#> --------------------
#> Deleted.Vars LogLik AIC AICc
#> 1 <none> -35.94877 79.89754 80.11494
#> 2 race -36.00167 78.00334 78.13307
#> 3 bwt -91.88479 189.76958 189.89931
#>
#> Iteration: 7
#> Single term deletions
#> Deleted Term: <none>
#> --------------------
#> Deleted.Vars LogLik AIC AICc
#> 1 <none> -36.00167 78.00334 78.13307
#> 2 bwt -92.26134 188.52268 188.58719
#>
#>
#> ---------------
#> Best model found:
#> low ~ bwt
best.mgee<-GEE(ssgee$model, family = "gaussian", data=birthwt,
coord=coords, corstr="fixed",scale.fix=TRUE)
best.wrm<-WRM(sswrm$model, family="gaussian", data=birthwt,
coord=coords, level = 1)
summary(best.mgee, printAutoCorPars=FALSE)
#>
#> Call:
#> GEE(formula = ssgee$model, family = "gaussian", data = birthwt,
#> coord = coords, corstr = "fixed", scale.fix = TRUE)
#> ---
#> Coefficients:
#> Estimate Std.Err t value Pr(>|t|)
#> (Intercept) 1.2492e+00 4.9121e-01 2.5430 0.01099 *
#> bwt -3.0919e-04 6.5913e-05 -4.6909 2.72e-06 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ---
#> QIC: 110.9656
#> ---
#> Autocorrelation of GLM residuals
#> [1] 0.837748633 0.724407532 0.602588671 0.500754270 0.387294592
#> [6] 0.275433941 0.147728669 0.008716423 -0.130798183 -0.268641655
#>
#> Autocorrelation of GEE residuals
#> [1] 0.43453709 0.35186795 0.27457621 0.21231229 0.10255028
#> [6] 0.08028419 0.07174312 0.04070057 0.02919975 -0.06904364
summary(best.wrm)
#>
#> Call:
#> WRM(formula = sswrm$model, family = "gaussian", data = birthwt,
#> coord = coords, level = 1)
#>
#> Pearson Residuals:
#> Min. 1st Qu. Median Mean 3rd Qu. Max.
#> -0.32380 -0.03533 -0.01311 0.00000 0.03328 0.42380
#> ---
#> Coefficients:
#> Estimate Std.Err t value Pr(>|t|)
#> (Intercept) 1.2809e+00 2.3089e-09 5.5477e+08 < 2.2e-16 ***
#> bwt -3.4305e-04 7.4898e-06 -4.5802e+01 < 2.2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> ---
#> Number of observations n: 189 , n.eff: 142 , AIC: 78.00334
#>
#> Number of iterations: 2
#> ---
#> Autocorrelation of glm.residuals
#> [1] 0.837748633 0.724407532 0.602588671 0.500754270 0.387294592
#> [6] 0.275433941 0.147728669 0.008716423 -0.130798183 -0.268641655
#> Autocorrelation of wavelet.residuals
#> [1] 0.427865614 0.201246818 0.139896894 0.103437367 0.046485859
#> [6] 0.045779123 0.025277466 0.034175107 0.005430915 0.055360212
Additionally, we offer multimodel inference tools for GEEs and WRMs which are loosely based on the MuMIn
package. These are implemented in mmiWMRR
and mmiGEE
. They enable you to examine the effect that the grid resolution and variable selection have on the resulting regressions, and then select the appropriate model for subsequent analyses. Note that mmiWMRR
has two more arguments than mmiGEE
that must be specified.
# Example for WRMs
data(carlinadata)
coords<- carlinadata[,4:5]
wrm<- WRM(carlina.horrida ~ aridity + land.use, "poisson",
carlinadata, coords, level=1, wavelet="d4")
ms1<-scaleWMRR(carlina.horrida ~ aridity + land.use,"poisson",
carlinadata,coords,scale=1,wavelet='d4',plot=F)
mmi<- mmiWMRR(wrm, data=carlinadata, scale=1, detail=TRUE)
#> ---
#> Level = 1
#> (Int) aridity land.use df logLik AIC delta weight
#> 3 3.63243 -3.78957 2 -1185.487 2375.0 0.00 0.995
#> 4 1.94780 2.11872 -3.42550 3 -1189.834 2385.7 10.69 0.005
#> 2 -0.82469 1.13710 2 -1200.512 2405.0 30.05 0.000
#> 1 -0.13692 1 -1221.456 2444.9 69.94 0.000
# Example for GEEs
library(MASS)
data(birthwt)
# impose an artificial (not fully appropriate) grid structure
x<-rep(1:14,14)
y<-as.integer(gl(14,14))
coords<-cbind(x[-(190:196)],y[-(190:196)])
formula<-formula(low ~ race + smoke + bwt)
mgee<-GEE(formula, family = "gaussian", data = birthwt,
coord=coords, corstr="fixed", scale.fix=TRUE)
mmi<-mmiGEE(mgee, birthwt)
#>
#> Model selection table:
#>
#> (Int) race smoke bwt df QLik QIC delta weight
#> 5 1.24916 -0.00031 3 -52.861 111.0 0.00 0.275
#> 6 1.25578 -0.00240 -0.00031 4 -52.788 111.1 0.11 0.261
#> 7 1.25003 -0.00118 -0.00031 4 -52.874 111.3 0.31 0.236
#> 8 1.25964 -0.00288 -0.00293 -0.00031 5 -52.787 111.3 0.38 0.228
#> 4 0.29818 0.02756 0.04083 4 -121.597 294.6 183.65 0.000
#> 3 0.35269 0.02366 3 -123.187 295.2 184.28 0.000
#> 2 0.32531 0.02216 3 -123.125 295.9 184.92 0.000
#> 1 0.36431 2 -123.987 296.3 185.32 0.000
#>
#> ---
#> Relative variable importance:
#>
#> race smoke bwt
#> 0.489 0.464 1.000
Finally, we offer one further model selection procedure specific to WRMs. rvi.plot
uses mmiWMRR
and creates a plot of the relative importance of each explanatory variable as a function of the resolution of the grid (in other words, as a function of the scale
argument in mmiWMRR
). It will also print the resulting model selection tables to the console.
data(carlinadata)
coords<- carlinadata[,4:5]
rvi.plot(carlina.horrida ~ aridity + land.use,"poisson",
data=carlinadata,coord=coords,maxlevel=4,detail=TRUE,wavelet="d4")
#>
#> Model selection tables:
#>
#> ---
#> Level = 1
#> (Int) aridity land.use df logLik AIC delta weight
#> 3 3.63243 -3.78957 2 -1185.487 2375.0 0.00 0.995
#> 4 1.94780 2.11872 -3.42550 3 -1189.834 2385.7 10.69 0.005
#> 2 -0.82469 1.13710 2 -1200.512 2405.0 30.05 0.000
#> 1 -0.13692 1 -1221.456 2444.9 69.94 0.000
#> ---
#> Level = 2
#> (Int) aridity land.use df logLik AIC delta weight
#> 4 2.23516 0.59909 -2.93096 3 -1184.169 2374.3 0.00 1
#> 2 -0.73426 0.75117 2 -1209.233 2422.5 48.13 0
#> 3 2.72922 -3.16193 2 -1228.572 2461.1 86.81 0
#> 1 -0.40235 1 -1262.854 2527.7 153.37 0
#> ---
#> Level = 3
#> (Int) aridity land.use df logLik AIC delta weight
#> 4 2.93438 0.49042 -3.50797 3 -1178.072 2362.1 0.00 1
#> 3 3.11997 -3.40857 2 -1198.412 2400.8 38.68 0
#> 2 -0.55307 0.46073 2 -1217.184 2438.4 76.22 0
#> 1 -0.28444 1 -1238.104 2478.2 116.06 0
#> ---
#> Level = 4
#> (Int) aridity land.use df logLik AIC delta weight
#> 2 -1.36217 1.87641 2 -1205.696 2415.4 0.00 1
#> 1 -0.02245 1 -1220.984 2444.0 28.58 0
#> 3 8.29497 -8.35099 2 -1272.480 2549.0 133.57 0
#> 4 7.65184 1.81274 -9.01193 3 -1292.419 2590.8 175.45 0
#>
#> ---
#> Relative variable importance:
#>
#> level=1 level=2 level=3 level=4
#> aridity 0.005 1 1 1
#> land.use 1.000 1 1 0
You may also find that a model not implemented by this package works best for your data. We still have some spatially corrected accuracy measures that you can use to assess goodness of model fit. The first two of these are categorized according to whether or not their outputs are dependent on the chosen threshold and first appeared in the spind 1.0
9. th.dep
(threshold dependent) and th.indep
(threshold independent) are designed to work on any number of model types, all you need is a set of actual values, predictions, and their associated coordinates. We’ll use the hook
data set to see how these work.
data(hook)
# Familiarize yourself with the data
?hook
head(hook)
df<-hook[,1:2]
coords<-hook[,3:4]
# Threshold dependent metrics
th.dep.indices<-th.dep(data=df, coord=coords, spatial=TRUE)
# Confusion Matrix
th.dep.indices$cm
#> [,1] [,2] [,3] [,4]
#> [1,] 5 2 0 0
#> [2,] 3 1 1 3
#> [3,] 2 0 0 8
#> [4,] 2 3 0 70
# Kappa statistic
th.dep.indices$kappa
#> [1] 0.628529
# Threshold independent metrics
th.indep.indices<-th.indep(data=df, coord=coords, spatial=TRUE, plot.ROC=TRUE)
# AUC
th.indep.indices$AUC
#> [1] 0.9424119
# TSS
th.indep.indices$TSS
#> [1] 0.7425474
Additionally, we include the function acfft
to calculate spatial autocorrelation of model residuals using Moran’s I statistic. A quick example below using a GLM and the musdata
data set.
coords<- musdata[,4:5]
mglm <- glm(musculus ~ pollution + exposure, "poisson", musdata)
ac<-acfft(coords[ ,1], coords[ ,2], resid(mglm, type="pearson"),
lim1=0, lim2=1, dmax=10)
ac
#> [1] 0.685338504 0.509680590 0.363021118 0.247398654 0.144726020
#> [6] 0.084220961 0.050228656 0.022369044 -0.001985639 -0.027296083
Note that you can adjust the number of distance bins to examine in acfft
using the dmax
argument. The default is 10.
Hopefully, you are now ready to utilize GEEs and WRMs to conquer the world of species distribution modeling. However, if this vignette has not served its purpose and you still have questions about how to use these tools (or how to improve this vignette), please let us know. Of course, no package is complete without bugs and we are always trying to improve our code. If you find any bugs that need squashing, have suggestions for additional functionality or improvements to existing functionality, please don’t hesitate to contact us10.
Carl G & Kuehn I, 2007. Analyzing Spatial Autocorrelation in Species Distributions using Gaussian and Logit Models, Ecol. Model. 207, 159 - 170]↩
Carey, V. J., 2006. Ported to R by Thomas Lumley (versions 3.13, 4.4, version 4.13)., B. R. gee: Generalized Estimation Equation solver. R package version 4.13-11.↩
Yan, J., 2004. geepack: Generalized Estimating Equation Package. R package version 0.2.10.↩
Hardin, J.W. & Hilbe, J.M. (2003) Generalized Estimating Equations. Chapman and Hall, New York.↩
Barnett et al. Methods in Ecology & Evolution 2010, 1, 15-24.↩
Carl, G., Kuehn, I. (2010): A wavelet-based extension of generalized linear models to remove the effect of spatial autocorrelation. Geographical Analysis 42 (3), 323 - 337↩
Whitcher, B. (2005) Waveslim: basic wavelet routines for one-, two- and three-dimensional signal processing. R package version 1.5.↩
Carl G, Doktor D, Schweiger O, Kuehn I (2016) Assessing relative variable importance across different spatial scales: a two-dimensional wavelet analysis. Journal of Biogeography 43: 2502-2512.↩
Carl G, Kuehn I (2016) Spind: a package for computing spatially corrected accuracy measures. Ecography. DOI: 10.1111/ecog.02593↩
Contact email - levisc8@gmail.com or visit the Github repo and create an issue at http://github.com/levisc8/spind/issues.↩