wingen uses a moving window approach to create maps of genetic diversity. The method and its rationale are described in Bishop et al. (2023) and on the Methods blog.
To demonstrate how wingen works, we will use a subset of the data from the Bishop et al. (2023) simulation example. These simulations were created using Geonomics (Terasaki Hart et al., 2022) to generate a realistic landscape genomic dataset. In this simulation, spatial variation in genetic diversity is produced by varying population size and gene flow across the landscape via heterogeneous carrying capacity and conductance surfaces. These surfaces are based on an example digital elevation model of Tolkien’s Middle Earth produced by the Center for Geospatial Analysis at William & Mary (Robert, 2020).
The small Middle Earth example dataset used here contains four
objects which are loaded by load_middle_earth_ex():
lotr_vcf - a vcfR object containing the genetic
data
lotr_coords - a dataframe object containing sample
coordinates
lotr_lyr - a raster object of the landscape (higher
values indicate greater connectivity/carrying capacity)
lotr_range - a polygon outlining the “range” of the
simulated species
load_middle_earth_ex()
#>
#> -------------- middle earth example --------------
#>
#> Objects loaded:
#> lotr_vcf: vcfR object (100 variants x 100 samples)
#> lotr_coords: dataframe with x and y coordinates
#> lotr_lyr: middle earth RasterLayer (100 x 100)
#> lotr_range: sf polygon of the species range
#>
#> --------------------------------------------------
# Genetic data
lotr_vcf
#> Loading required namespace: vcfR
#> ***** Object of Class vcfR *****
#> 100 samples
#> 1 CHROMs
#> 100 variants
#> Object size: 0.1 Mb
#> 0 percent missing data
#> ***** ***** *****
# Coordinates
head(lotr_coords)
#> x y
#> 538 88.73547 -66.61610
#> 1397 78.50479 -23.24048
#> 1200 14.32163 -25.99363
#> 952 89.86373 -65.49860
#> 1177 45.42427 -23.13054
#> 383 98.89395 -13.88943
# Raster data
lotr_lyr
#> class : RasterLayer
#> dimensions : 100, 100, 10000 (nrow, ncol, ncell)
#> resolution : 1, 1 (x, y)
#> extent : 0, 100, -100, 0 (xmin, xmax, ymin, ymax)
#> crs : NA
#> source : memory
#> names : lyr
#> values : 0, 1 (min, max)
# Map of data
plot(lotr_lyr, col = magma(100), axes = FALSE, box = FALSE)
points(lotr_coords, col = mako(1, begin = 0.8), pch = 3, cex = 0.5)If users don’t have a raster layer of their landscape, they can
generate one using their sample coordinates with the
coords_to_raster() function. The resolution of this raster
can be tuned either with the agg (to aggregate) or
disagg (to disaggregate) arguments, or defined using the
res argument. The res argument can either be a
single value (e.g., 0.00833) or a vector of two values with the x and y
resolutions. The buffer argument can be used to add an edge
to the raster (i.e., buffer away from the coordinates).
The workflow of wingen uses three main functions:
window_gd(), circle_gd(), or
resist_gd() to generate moving window maps of genetic
diversity
wkrig_gd() to use kriging to interpolate the moving
window maps
mask_gd() to mask areas of the maps from (1) and (2)
(e.g., to exclude areas outside the study region)
There are three functions that can be used to generate moving window maps:
window_gd() uses a simple rectangular window with
user-specified dimensions; this is the original method described in
Bishop et al. 2023
circle_gd() uses a circular window with a
user-specified radius
resist_gd() uses a resistance distance-based window
with a user-specified maximum distance (this can be thought of as a
circular window with a radius that varies depending on the resistance of
the landscape around it)
The main arguments that all three functions have in common are:
gen - An object of type vcfR or a path to a vcf file
with genotype data. The order of this file matters! The coordinate and
genetic data should be in the same order, as there are currently no
checks for this.
coords - sf points, or a matrix or dataframe with
two columns representing the coordinates of the samples. The first
column should be x and the second should be y.
lyr - A SpatRaster or RasterLayer which the window
will move across to create the final map. In most cases, this will take
the form of a raster of the study area.
stat - The genetic diversity summary statistic to
calculate. Current genetic diversity metrics that can be specified with
stat include:
"pi" for nucleotide diversity (default) calculated
using hierfstat::pi.dosage() (only works for bi-allelic
data). Use the L argument to set the sequence length for
the pi calculation. If the sequence length is unknown, L
can be set to “nvariants” (default) which sets L to the
number of variants; however, note that this is not strictly the correct
calculation of pi because it does not account for invariant sites. If
L is set to NULL, the sum over SNPs of nucleotide diversity
is returned (note: L = NULL is the ). Otherwise,
L can be set to any numeric value which will serve as the
denominator of the pi calculation.
"Ho" for average observed heterozygosity across all
sites
"allelic_richness" for average number of alleles
across all sites
"biallelic_richness" for average allelic richness
across all sites for a biallelic dataset (this option is faster than
"allelic_richness"). When calculating
“biallelic_richness”, users can choose to rarify allele
counts (as in hierfstat::allelic.richness()) by setting
rarify_alleles = TRUE (the default) or to use the raw
allele counts by setting rarify_alleles = FALSE. We
recommend performing allele count rarefaction
(rarify_alleles = TRUE) if there are missing values in the
genetic data, but for datasets with no missing data, it is faster to use
the raw counts (rarify_alleles = FALSE)
"hwe" for the proportion of sites that are not in
Hardy-Weinberg equilibrium, calculated using pegas at the
0.05 level (other alpha levels can be specified by adding the
sig argument; e.g., sig = 0.10)
"basic_stats" for a series of statistics produced by
hierfstat::basic.stats() including mean observed
heterozygosity (same as Ho), mean gene diversities within population
(Hs), Gene diversities overall (Ht), and Fis following Nei (1987).
Population-based statistics (e.g., FST) normally reported by
hierfstat::basic.stats() are not included as they are not
meaningful within the individual-based moving windows
fact - To decrease computational time, we provide
the option to aggregate the input raster layer by some factor defined
using the fact argument. Increasing fact will
decrease the number of cells and thereby decrease the number of
calculations, with the trade-off that the resolution of the output
layers will decrease. Users should keep in mind that if they increase
fact, they may simultaneously want to decrease
wdim because the proportion of the landscape covered by the
neighborhood matrix could otherwise increase substantially.
rarify - Users have the option to perform
rarefaction by setting the rarify argument to
TRUE. If rarify = TRUE, users then define
rarify_n as the number of samples to rarify to and
rarify_nit as the number of iterations for rarefaction
(e.g., if rarify_n = 4 and rarify_nit = 5, for
each sample set, four random samples will be drawn five times). Users
can also set rarify_nit = "all"to use all possible
combinations of samples of size rarify_n within the window
(for example, if rarify_n = 4 and the number of samples in
the window is 5, all 20 possible combinations of samples will be used).
As the window moves across the landscape, three things can occur based
on the number of samples in the window: (1) if the number of samples is
lower than rarify_n, genetic diversity is not calculated
and a raster value of NA is assigned, (2) if the number of samples is
equal to rarify_n, the genetic diversity statistic is
calculated for those samples, (3) if the number of samples is greater
than rarify_n, rarefaction is implemented and that set of
samples is subsampled rarify_nit times to a size of
rarify_n and the mean (or another summary statistic set
using fun) of those rarify_nit iterations is
used. Note that if the number of samples is lower than
min_n, a raster value of NA will always be assigned, even
if the number of samples is not less than rarify_n (i.e.,
if min = 2 and rarify_n = 1 and the sample
size is 1 in the window, then an NA value will be assigned; to change
this behavior so that genetic diversity is calculated, set
min_n = 1).
If rarify = FALSE, rarefaction is not performed and only
steps (1) and (2) from above occur such that: (1) if the number of
samples in the window is less than the min_n argument,
genetic diversity is not calculated and a raster value of NA is
assigned, and (2) if the number of samples is equal to or greater than
min_n, the genetic diversity statistic is calculated for
those samples. We highly encourage users to perform rarefaction as
genetic diversity statistics are sensitive to sample size. The main
benefit of not performing rarefaction is decreased computational time;
however, this is not worth the trade-off in inaccuracy unless you are
confident that there is no effect of rarefaction after performing your
analysis with and without rarefaction.
Note: Coordinates and rasters used in wingen should be in a projected (planar) coordinate system such that raster cells are of equal sizes. Therefore, spherical systems (including latitude-longitude coordinate systems) should be projected prior to use. An example of how this can be accomplished is shown below. If no CRS is provided, a warning will be given and wingen will assume the data are provided in a projected system.
# First, we create example latitude-longitude coordinates and rasters
## Example raster:
lyr_longlat <- rast(
ncols = 40, nrows = 40, xmin = -110, xmax = -90, ymin = 40, ymax = 60,
crs = "+proj=longlat +datum=WGS84"
)
## Example coordinates:
coords_df <- data.frame(x = c(-110, -90), y = c(40, 60))
coords_longlat <- st_as_sf(coords_df, coords = c("x", "y"), crs = "+proj=longlat")
# Next, the coordinates and raster can be projected to an equal area projection, in this case the Goode Homolosine projection (https://proj.org/operations/projections/goode.html):
coords_eq <- st_transform(coords_longlat, crs = "+proj=goode")
lyr_eq <- project(lyr_longlat, "+proj=goode")
# Coordinates can be switched back to latitude-longitude the same way by replacing "goode" with "longlat"window_gd()To create a moving window map with a rectangular window, use
window_gd(). The main additional arguments
to window_gd() are:
wdim - Used to create the neighborhood matrix for
the moving window based on the dimensions provided. This argument can
either be set to one value (e.g., 3) which will create a square window
(e.g., 3 x 3), or two values, which will create a rectangular window
(e.g., 3 x 5). We encourage users to experiment with different values of
wdim to determine the sensitivity of their results to this
parameter. Ideally, wdim would be set with some knowledge
of the study system in mind (e.g., the dispersal patterns and/or
neighborhood size of the study organism). A preview of the window size
relative to the landscape can be obtained using the
preview_gd() function.
crop_edges - Whether to crop out the cells along the
edge of the raster that are not surrounded by a full window. Users may
want to do so to avoid “edge effects” caused by incomplete windows along
the borders of the raster. As wingen is relatively insensitive to sample
size, edge effects are not likely to have a very strong effect on
diversity estimates, so by default
crop_edges = FALSE.
Before running window_gd(), users can preview the moving
window and the counts within each cell of the raster to get a sense of
how big the window is and what the density of counts looks like across
their landscape. Here, we provide the raster layer
(lotr_lyr), the coordinates (lotr_coords), the
window dimensions (7 x 7), the aggregation factor (3), and the minimum
sample number (min_n). min_n will be used to
mask the sample count layer to show how much of the landscape will be
excluded due to low sample count. We specify
method = window for a preview of window_gd(),
we will discuss other methods (i.e., circle and
resist later).
preview_gd(lotr_lyr,
lotr_coords,
method = "window",
wdim = 7,
fact = 3,
sample_count = TRUE,
min_n = 2
)#> class : SpatRaster
#> size : 34, 34, 1 (nrow, ncol, nlyr)
#> resolution : 3, 3 (x, y)
#> extent : 0, 102, -102, 0 (xmin, xmax, ymin, ymax)
#> coord. ref. :
#> source(s) : memory
#> name : sample_count
#> min value : 2
#> max value : 11
Next, we run the moving window function with our vcf, coordinates, and raster layer. Here, we set the parameters to calculate pi, using a window size of 7 x 7, an aggregation factor of 3, and rarefaction with a rarefaction size of 2 (i.e., minimum sample size of 2), and 5 iterations.
We then plot the genetic diversity layer (the first layer of the produced raster stack) and the sample counts layer (the second layer).
Note: you will get a warning that no CRS is found for the provided coordinates or raster and to check that the CRS for these objects match. This is because these simulated data don’t have a CRS, but we know they match, so we can ignore this warning.
wgd <- window_gd(lotr_vcf,
lotr_coords,
lotr_lyr,
stat = "pi",
wdim = 7,
fact = 3,
rarify = TRUE,
rarify_n = 2,
rarify_nit = 5,
L = 100
)
#> Warning in crs_check_window(lyr, coords): No CRS found for the provided coordinates. Make sure the coordinates and the raster have the same projection (see function details or wingen vignette)
#> Warning in crs_check_window(lyr, coords): No CRS found for the provided raster. Make sure the coordinates and the raster have the same projection (see function details or wingen vignette)
# The ggplot_gd function plots the genetic diversity layer
ggplot_gd(wgd, bkg = lotr_range) +
ggtitle("Moving window pi") +
# Add sampling coordinates
geom_point(data = lotr_coords, aes(x = x, y = y), pch = 16)# The plot_count function plots the sample count layer
ggplot_count(wgd) +
ggtitle("Moving window sample counts")You can also create base R plots of your results using
plot_gd() and plot_count():
circle_gd()To create a moving window map with a circular window, we can use
circle_gd(). The main additional arguments
to circle_gd() are:
maxdist - maximum geographic distance used to define
the neighborhood; any samples farther than this distance will not be
included (this can be thought of as the neighborhood radius). Can be
provided as either (1) a single numeric value or (2) a SpatRaster where
each pixel is the maximum distance to be used for that cell on the
landscape (must be the same spatial scale as lyr). As with
wdim, we encourage users to experiment with different
values of maxdist to determine the sensitivity of their
results to this parameter and ideally set it with some knowledge of the
study system in mind (e.g., the dispersal patterns and/or neighborhood
size of the study organism).
distmat - optional distance matrix output from
get_geodist(); if not provided, circle_gd()
will automatically use the get_geodist() function to
calculate the distance matrix instead.
Like before, we can use the preview_gd() function to
preview parameter settings. This time we specify
method = "circle" and need to provide maxdist.
Before doing so, we can calculate the distance matrix using
get_geodist() which we can then use in both
preview_gd() and circle_gd() to save time.
get_geodist() produces a distance matrix where the rows
represent the cells of the raster and the columns are the individuals.
The values are the geographic distance between each cell and each
individual. Note that the input layer for get_geodist()
must have the same resolution as the input layer for
preview_gd() and circle_gd(). For example, if
you want to run circle_gd() with fact = 3 and
lyr = lotr_lyr, the input layer to
get_geodist() must be aggregate(lotr_lyr, 3).
Alternatively, you can first create a new aggregated raster to be used
for all functions, as we have done below.
Note: if you want to get only the distances between your
sample coordinates, set coords_only = TRUE.
# First, create a new aggregated raster
lotr_lyr5 <- aggregate(lotr_lyr, 5)
# Then calculate the distance matrix
distmat <- get_geodist(coords = lotr_coords, lyr = lotr_lyr5)
preview_gd(lotr_lyr5,
lotr_coords,
method = "circle",
maxdist = 10,
distmat = distmat,
sample_count = TRUE,
min_n = 2
)#> class : SpatRaster
#> size : 20, 20, 1 (nrow, ncol, nlyr)
#> resolution : 5, 5 (x, y)
#> extent : 0, 100, -100, 0 (xmin, xmax, ymin, ymax)
#> coord. ref. :
#> source(s) : memory
#> name : sample_count
#> min value : 2
#> max value : 9
Then, we can run circle_gd():
cgd <- circle_gd(lotr_vcf,
lotr_coords,
lotr_lyr5,
stat = "pi",
maxdist = 10,
distmat = distmat,
rarify = FALSE,
L = 100
)
ggplot_gd(cgd, bkg = lotr_range) +
ggtitle("Circle moving window pi")For circle_gd() (and resist_gd()), you can
also define maxdist with a raster to set the maximum
distance (i.e., the radius) for each cell. In this case, we can use the
carrying capacity/conductance map (lotr_lyr5) and multiply
it by 100 so that the radius will range from 0 to 100 based on the value
of the raster.
vcgd <- circle_gd(
lotr_vcf,
lotr_coords,
lotr_lyr5,
stat = "pi",
maxdist = lotr_lyr5 * 100,
distmat = distmat,
rarify = FALSE,
L = 100
)
ggplot_gd(vcgd, bkg = lotr_range) +
ggtitle("Circle moving window pi")resist_gd()To create a moving window map with resistance distances, use
resist_gd(). The main additional arguments
to resist_gd() are:
maxdist - maximum cost distance used to define the
neighborhood; any samples further than this cost distance will not be
included (this can be thought of as the neighborhood radius, but in
terms of cost distance). Can be provided as either (1) a single numeric
value or (2) a SpatRaster where each pixel is the maximum distance to be
used for that cell on the landscape (must be the same spatial scale as
lyr). As stated before, we encourage users to experiment
with different values of maxdist to determine the
sensitivity of their results to this parameter.
lyr - in addition to being the raster layer the
window moves across, this will also serve as the conductivity layer used
to calculate resistance distances (higher values should mean greater
conductivity).
distmat - optional distance matrix output from
get_resdist(); if not provided, resist_gd()
will automatically use the get_resdist() function to
calculate the distance matrix instead.
resist_gd() takes a longer time to run and we recommend
taking advantage of parallelization, in particular for the calculation
of the distance matrix using get_resdist(). We also
recommend first calculating the distance matrix with
get_resdist() and then providing it to
resist_gd() using the distmat argument as this
allows for resist_gd() to be run quickly multiple times
(i.e., to test different parameters) by avoiding repeating the costly
distance calculations. Note that lyr in both
get_resdist() and in resist_gd() must have the
same spatial scale (e.g., the same resolution and extent).
As with get_geodist(), get_resdist()
produces a distance matrix where the rows represent the cells of the
raster and the columns are the individuals. The values are the
resistance distance between each cell and each individual. If you want
to get only the distances between your sample coordinates, set
coords_only = TRUE.
Again, we can use the preview_gd() function to preview
parameter settings. Here, we specify method = "resist" and
provide maxdist.
preview_gd(lotr_lyr5,
lotr_coords,
method = "resist",
maxdist = 60,
distmat = lotr_distmat,
sample_count = TRUE,
min_n = 2
)#> class : SpatRaster
#> size : 20, 20, 1 (nrow, ncol, nlyr)
#> resolution : 5, 5 (x, y)
#> extent : 0, 100, -100, 0 (xmin, xmax, ymin, ymax)
#> coord. ref. :
#> source(s) : memory
#> name : sample_count
#> min value : 2
#> max value : 20
rgd <- resist_gd(lotr_vcf,
lotr_coords,
lotr_lyr5,
distmat = lotr_distmat,
stat = "pi",
maxdist = 60,
rarify = FALSE,
L = 100
)
ggplot_gd(rgd, bkg = lotr_range) +
ggtitle("Resistance moving window pi")In addition to using continuous rasters of connectivity, you can also use rasters to designate passable and impassable areas. This can be useful for designating boundaries you don’t want windows to extend across (e.g., mountain ranges) or extend between (e.g., rivers).
# create layer
resist_lyr <- rast(aggregate(lotr_lyr, 5))
# create an impassable area down the middle coded as NA
resist_lyr[] <- 1
resist_lyr[ext(40, 60, -100, 0)] <- NA
# get resistance distances
distmat <- get_resdist(coords = lotr_coords, lyr = resist_lyr)
# plot preview
preview_gd(resist_lyr, lotr_coords, method = "resist", distmat = distmat, maxdist = 10)#> class : SpatRaster
#> size : 20, 20, 1 (nrow, ncol, nlyr)
#> resolution : 5, 5 (x, y)
#> extent : 0, 100, -100, 0 (xmin, xmax, ymin, ymax)
#> coord. ref. :
#> source(s) : memory
#> name : sample_count
#> min value : 0
#> max value : 10
# create moving window map
resist_gd <- resist_gd(
lotr_vcf,
lotr_coords,
resist_lyr,
maxdist = 10,
distmat = distmat
)
ggplot_gd(resist_gd)
To produce smoother maps of genetic diversity, we provide the
function wkrig_gd() which creates a spatially interpolated
raster from the moving window raster produced by the moving window
functions. This function uses the gstat package to fit different
variogram models and select the best one to perform kriging.
Kriging is performed by first transforming the moving window layer into a set of coordinates with corresponding genetic diversity (or sample count) values and then interpolating using these coordinates across the grid provided. Because of this, it is important to keep in mind how the coordinates from the moving window raster and the grid align. If the resolution of the moving window raster is much lower than that of the grid, there are fewer points for interpolation which can result in grid-like artifacts during kriging.
To deal with this issue, users can either (1) resample their moving
window raster layers and grid layers to the same resolution using the
resample argument, or (2) manually disaggregate or
aggregate either the moving window or grid layers using the terra
aggregate() or disagg() functions. Generally,
if users want a smoother resulting surface, a higher resolution grid
layer should be used. Keep in mind that increasing the resolution of the
moving window layer (either by resampling or disaggregating) can
increase computational time as this increases the number of coordinates
being used for kriging. This is also the case for increasing the
resolution of the grid layer, though to a lesser extent.
To run this function, we provide the moving window raster layer and a
raster layer to interpolate across. If no raster layer for interpolation
is provided, the moving window layer will be used (i.e., the output
kriged layer will have the same resolution and extent of the input
window layer). If a stack of rasters is provided, only the first layer
will be used. The output of this function is a kriged raster layer.
Optionally, users can supply the moving window sample counts layer with
the weight_r argument to set weights for variogram fitting
and kriging, giving higher influence to cells with more samples. Users
may also want to test different values for nmax, which is
the maximum number of points to use for kriging; here, we use
nmax = 30 to balance speed and accuracy.
# First, let's make a layer to krige across
# Let's say I want to krige to twice the resolution of the moving window layer
# I can use the disagg() function to disaggregate the moving window layer
grd <- disagg(wgd, 2)
# Then, I can use this layer as the grid to krige across
kgd <- wkrig_gd(wgd[["pi"]], grd = grd, weight_r = wgd[["sample_count"]], nmax = 30)
#> [using ordinary kriging]
ggplot_gd(kgd) +
ggtitle("Kriged pi")Users can optionally get the full model output by setting
model_output = TRUE in which case wkrig_gd()
returns a list where the first element (“raster”) is the kriged raster,
the second element (“variogram”) is the variogram, and the third element
(“model”) is the best-fit variogram model.
Note: Kriging can produce values that fall outside the range
of possible values (e.g., genetic diversity values less than 0 or
greater than the possible maximum). By default, in order to ensure that
the values are bounded within the range of possible values,
wkrig_gd() converts all values greater than the maximum of
the input raster to that maximum (upper_bound = TRUE), and
all values less than the minimum of the input raster to that minimum
(lower_bound = TRUE). Users can turn off this functionality
by setting lower_bound and upper_bound to
FALSE. Users can also set their own custom
lower_bound and upper_bound values (e.g., for
heterozygosity or pi, you may want to set lower_bound = 0
and upper_bound = 1).
Legacy method: krig_gd() Users should note that
wkrig_gd() superseded krig_gd(). This legacy
function used the autoKrige() function from the R package
automap to perform kriging using an automatically generated variogram.
We have found that wkrig_gd() provides improved performance
for interpolation of the moving window raster and we recommend using it
instead of krig_gd().
Next, we mask the resulting kriged layers. Masking can be performed using a variety of methods.
Method 1: Mask using the carrying capacity layer to exclude any areas where the carrying capacity is lower than 0.1. Alternatively, one could use a species distribution model or habitat suitability model to exclude areas where the probability of presence is very low:
# Aggregate the lotr_lyr to make it the same resolution as kgd before masking
# Note: lotr_lyr is a RasterLayer, so we convert it to a SpatRaster
mask_lyr <- resample(rast(lotr_lyr), kgd)
mgd <- mask_gd(kgd, mask_lyr, minval = 0.1)
ggplot_gd(mgd) +
ggtitle("Kriged & carrying capacity masked pi")Method 2: Mask the layer using a species range map (in this case, an sf polygon) to exclude areas falling outside the species range.
Method 3: Mask the layer using a spatial Kernel Density Estimation (KDE) of sample density to exclude areas with low sampling density using the SpatialKDE package:
# First, install and load the SpatialKDE package
if (!require("SpatialKDE", quietly = TRUE)) install.packages("SpatialKDE")
library(SpatialKDE)# Note: this code is not evaluated when building the vignette as it requires the SpatialKDE package
# Spatial KDE requires an sf data.frame containing only POINTS that is in a projected coordinate system
# The simulated coordinates are not projected, but for the purposes of this example, we'll pretend they are projected under the Goode Homolosine projection
# This kind of arbitrary setting of crs should not be done for real data (see above example for how to properly project coordinates)
lotr_sf <- st_as_sf(lotr_coords, coords = c("x", "y")) %>% st_set_crs("+proj=goode")
# The grid layer must also be a RasterLayer for SpatialKDE
# Here, we use the kriged raster as our grid to get a KDE layer of the same spatial extent and resolution
grid <- raster(kgd)
# See the SpatialKDE package for more options and details about using KDE
kde_lyr <- kde(
lotr_sf,
kernel = "quartic",
band_width = 15,
decay = 1,
grid = grid,
)
# Visualize KDE layer
ggplot_count(kde_lyr) +
ggtitle("KDE")
# Mask with mask_gd
mgd <- mask_gd(kgd, kde_lyr, minval = 1)
ggplot_gd(mgd) +
ggtitle("Kriged & sample density masked pi")Another nice visualization option is to add a “background” to your plots in the form of a raster or other sp or sf object (e.g., a country or range boundary) which can help provide geographic context:
To increase computational speed, users can perform any of the moving window functions calculations with parallelization using future::plan().
Checkout the furrr package documentation for more details on setting up the back end for parallelization.
# Note: this code is not evaluated when building the vignette as it spawns multiple processes
# setup parallel session
future::plan("multisession", workers = 2)
wgd <- window_gd(lotr_vcf,
lotr_coords,
lotr_lyr,
stat = "pi",
wdim = 7,
fact = 3,
rarify_n = 2,
rarify_nit = 5,
rarify = TRUE
)
# end parallel session
future::plan("sequential")Note: the parallel and ncores
arguments have been deprecated as of wingen 2.0.1.
You can calculate multiple statistics at once by providing a vector
of statistic names to stat.
multistat_wgd <- window_gd(lotr_vcf,
lotr_coords,
lotr_lyr,
stat = c("pi", "Ho"),
wdim = 7,
fact = 3,
rarify = FALSE,
L = 100
)#>
#> [[2]]
"hwe" and "basic_stats" take longer to
calculate, so we do not evaluate these statistics here. We also change
the parameters and reduce the dataset in the example to make it faster
for users who wish to run it:
hwe_wgd <- window_gd(lotr_vcf[1:10, ],
lotr_coords,
lotr_lyr,
stat = "hwe",
wdim = 3,
fact = 5,
rarify = FALSE,
L = 100,
sig = 0.10
)
bs_wgd <- window_gd(lotr_vcf[1:10, ],
lotr_coords,
lotr_lyr,
stat = "basic_stats",
wdim = 3,
fact = 5,
rarify = FALSE,
L = 100
)
ggplot_gd(hwe_wgd, bkg = lotr_range)
ggplot_gd(bs_wgd, bkg = lotr_range)We provide a window_general function that can be used to
make moving window maps for other types of data inputs and functions.
Unlike window_gd, window_general does not
require a vcfR object or a path to a vcf file as input.
The required input (x) depends on the statistic
(stat) being calculated.
For the standard genetic diversity statistics:
If stat = "pi" or "biallelic_richness",
x must be a dosage matrix with values of 0, 1, or 2
(Note: rows must be individuals).
If stat = "Ho", x must be a
heterozygosity matrix where values of 0 = homozygosity and values of 1 =
heterozygosity (Note: rows must be individuals).
If stat = "allelic_richness", x must be
a genind type object.
If stat = "basic_stats", x must be a
hierfstat type object.
For other statistics:
If x is a vector, stat can be any
function that can be applied to a vector (e.g.,
stat = mean, var, sum, etc.).
If x is a matrix or data frame (Note: rows
must be individuals), stat can be any function that takes a
matrix or data frame and outputs a single numeric value (e.g., a
function that produces a custom diversity index). (Note: this
functionality has not have been tested extensively and may produce
errors, so use with caution).
As an example, let’s create a moving window map of our raster layer values (e.g., carrying capacity and conductance, in this case) at the sample coordinates:
# First, we extract the raster values at those coordinates
vals <- extract(lotr_lyr, lotr_coords)
# Next, we run the window_general function with the env vector and set the `stat` to mean
# Note: we can also provide additional arguments to functions, such as na.rm = TRUE
we <- window_general(vals,
coords = lotr_coords,
lyr = lotr_lyr,
stat = mean,
wdim = 7,
fact = 3,
rarify_n = 2,
rarify_nit = 5,
rarify = TRUE,
na.rm = TRUE
)
ggplot_gd(we) +
ggtitle("Mean raster value")Although the default input data type for wingen is a VCF file, which
is a standard file type to encode genomic data, wingen can accept a
genind object as input to calculate allelic richness using
window_general(). Given that some wingen functionality can
be feasible using a genind object, users could easily
convert various other genetic data file types to then run wingen.
For example, genepop (.gen) is a typical file format for encoding
microsatellite data, in which each allele within a locus is coded using
a two- or three-digit system (i.e., in a diploid organism, in a
two-digit system, each locus would be assigned four digits specifying
each of two alleles. In a three-digit system in the same organism, each
locus would be assigned six digits encoding those same two alleles).
Users can convert a genepop file into a genind object using
the read.genepop() function in the adegenet package, which
automatically reads in a .gen file as a genind object. In
many cases, microsatellite data may not be biallelic, and may contain
more than one observed allele at a given locus; in such cases, running
window_general() with a genind object is still
feasible.
An example of how one could use a genind object as input
into window_general() is as follows:
# We use the vcfR package to convert vcf to genind for our example
library(vcfR)
# Convert existing vcf example file into genind object:
genind <- vcfR2genind(lotr_vcf)
# Run window_general with no rarefaction
we_gi <- window_general(genind,
coords = lotr_coords,
lyr = lotr_lyr,
stat = "allelic_richness",
wdim = 7,
fact = 3,
na.rm = TRUE
)
ggplot_gd(we_gi) +
ggtitle("Allelic richness from genind")