Takens’ theory proves that, for a dynamic system \(\phi\), if its trajectory converges to an attractor manifold \(M\), which are consisted by a bounded and invariant set of states, then the mapping between \(\phi\) and M can be built and time series observations of \(\phi\) can be used to construct \(M\).
According to the generalized embedding theorem, for a compact \(d\)-dimensional manifold \(M\) and a set of observation functions \(\left<h_1,h_2,\ldots,h_L\right>\), the map \(\psi_{\phi,h} = \left<h_1\left(x\right),h_2\left(x\right),\ldots,h_L\left(x\right)\right>\) is an embedding of \(M\) with \(L = 2d + 1\). Here embedding means a one-to-one map resolving all singularities of the original manifold. The elements \(h_i\) can be lags of observations from single time series observations, lags of observations from multiple time series, or multiple observation functions. The first two constructions are only special cases of the third one.
By taking the measured values at one specific unit and its neighbors (named as spatial lags in spatial statistics) as a set of observation functions, \(\psi_{\phi,h} \left(x,s\right) = \left<h_s\left(x\right),h_{s\left(1\right)}\left(x\right),\ldots,h_{s\left(L-1\right)}\left(x\right)\right>\) is a embedding, where \(s\) is the focal unit currently under investigation and \(s\left(i\right)\) is its \(i\)-th order of spatial lags. \(h_s\left(x\right)\) and \(h_{s\left(i\right)}\left(x\right)\) are their observation functions respectively. (Hereinafter, we will use \(\psi \left(x,s\right)\) to present \(\psi_{\phi,h} \left(x,s\right)\) for short). For two spatial variables \(X\) and \(Y\) on the same set of spatial units, their values and spatial lags can be regarded as observation functions reading values from each spatial unit. As the spatial lags in each order contain more than one spatial units, the observation function can be set as the mean of the spatial units or other summary functions considering the spatial direction, to assure the one-to-one mapping of the original manifold \(M\).
The cross-mapping prediction is defined as:
\[ \hat{Y}_s \mid M_x = \sum\limits_{i=1}^{L+1} \left(\omega_{si}Y_{si} \mid M_x \right) \]
where \(s\) represents a spatial unit at which the value of \(Y\) needs to be predicted, \(\hat{Y}_s\) is the prediction result, \(L\) is the number of dimensions of the embedding, \(si\) is the spatial unit used in the prediction, \(Y_{si}\) is the observation value at \(si\) and simultaneously the first component of a state in \(M_y\), noted as \(\psi\left(y,s_i\right)\). In further, \(\psi\left(y,s_i\right)\) is determined by its one-to-one mapping point \(\psi\left(x,s_i\right)\), which is in turn one of the \(L+1\) nearest neighbors of the focal state in \(M_x\). \(\omega_{si}\) is the corresponding weight defined as:
\[ \omega_{si} \mid M_x = \frac{weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)}{\sum_{i=1}^{L+1}weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)} \] where \(weight \left(\ast,\ast\right)\) is the weight function between two states in the shadow manifold, defined as:
\[ weight \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right) = \exp \left(- \frac{dis \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right)}{dis \left(\psi\left(x,s_1\right),\psi\left(x,s\right)\right)} \right) \]
where \(\exp\) is the exponential function and \(dis \left(\ast,\ast\right)\) represents the distance function between two states in the shadow manifold defined as:
\[ dis \left(\psi\left(x,s_i\right),\psi\left(x,s\right)\right) = \frac{1}{L} \left(\left|h_{si}\left(x\right)-h_{s}\left(x\right)\right| + \sum_{k=1}^{L-1}abs \left[h_{si\left(k\right)}\left(x\right),h_{s\left(k\right)}\left(x\right)\right]\right) \] Note that the absolute value distance is used here.
The skill of cross-mapping prediction is measured by the Pearson correlation coefficient between the true observations and corresponding predictions:
\[ \rho = \frac{Cov\left(Y,\hat{Y}\right)}{\sqrt{Var\left(Y\right) Var\left(\hat{Y}\right)}} \]
The prediction skill \(\rho\) varies by setting different sizes of libraries, which means the quantity of observations used in reconstruction of the shadow manifold. We can use the convergence of \(\rho\) to infer the causal associations. For GCCM, the convergence means that \(\rho\) increases with the size of libraries and is statistically significant when the library becomes largest. And the confidence interval of \(\rho\) can be estimated based the \(z\)-statistics with the normal distribution:
\[ t = \rho \sqrt{\frac{n-2}{1-\rho^2}} \] where \(n\) is the number of observations to be predicted, and
\[ z = \frac{1}{2} \ln \left(\frac{1+\rho}{1-\rho}\right) \]
spEDM
packageLoad the spEDM
package:
Load data and package:
popd_nb = spdep::read.gal(system.file("extdata/popdensity_nb.gal",
package = "spEDM"))
## Warning in spdep::read.gal(system.file("extdata/popdensity_nb.gal", package = "spEDM")): neighbour
## object has 4 sub-graphs
popd_nb
## Neighbour list object:
## Number of regions: 2806
## Number of nonzero links: 15942
## Percentage nonzero weights: 0.2024732
## Average number of links: 5.681397
## 4 disjoint connected subgraphs
popdensity = readr::read_csv(system.file("extdata/popdensity.csv",
package = "spEDM"))
## Rows: 2806 Columns: 7
## ── Column specification ────────────────────────────────────────────────────────────────────────────────
## Delimiter: ","
## dbl (7): x, y, popDensity, DEM, Tem, Pre, slop
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
popdensity
## # A tibble: 2,806 × 7
## x y popDensity DEM Tem Pre slop
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 117. 30.5 780. 8 17.4 1528. 0.452
## 2 117. 30.6 395. 48 17.2 1487. 0.842
## 3 117. 30.8 261. 49 16.0 1456. 3.56
## 4 116. 30.1 258. 23 17.4 1555. 0.932
## 5 116. 30.5 211. 101 16.3 1494. 3.34
## 6 117. 31.0 386. 10 16.6 1382. 1.65
## 7 117. 30.2 350. 23 17.5 1569. 0.346
## 8 117. 30.7 470. 22 17.1 1493. 1.88
## 9 117. 30.6 1226. 11 17.4 1526. 0.208
## 10 116. 30.9 137. 598 13.9 1458. 5.92
## # ℹ 2,796 more rows
popd_sf = sf::st_as_sf(popdensity, coords = c("x","y"), crs = 4326)
popd_sf
## Simple feature collection with 2806 features and 5 fields
## Geometry type: POINT
## Dimension: XY
## Bounding box: xmin: 74.9055 ymin: 18.2698 xmax: 134.269 ymax: 52.9346
## Geodetic CRS: WGS 84
## # A tibble: 2,806 × 6
## popDensity DEM Tem Pre slop geometry
## * <dbl> <dbl> <dbl> <dbl> <dbl> <POINT [°]>
## 1 780. 8 17.4 1528. 0.452 (116.912 30.4879)
## 2 395. 48 17.2 1487. 0.842 (116.755 30.5877)
## 3 261. 49 16.0 1456. 3.56 (116.541 30.7548)
## 4 258. 23 17.4 1555. 0.932 (116.241 30.104)
## 5 211. 101 16.3 1494. 3.34 (116.173 30.495)
## 6 386. 10 16.6 1382. 1.65 (116.935 30.9839)
## 7 350. 23 17.5 1569. 0.346 (116.677 30.2412)
## 8 470. 22 17.1 1493. 1.88 (117.066 30.6514)
## 9 1226. 11 17.4 1526. 0.208 (117.171 30.5558)
## 10 137. 598 13.9 1458. 5.92 (116.208 30.8983)
## # ℹ 2,796 more rows
Select the appropriate embedding dimension E:
simplex(popd_sf,"Pre",lib = 1:2000,pred = 2001:nrow(popd_sf),k = 6,nb = popd_nb)
## The suggested embedding dimension E for variable Pre is 1
## E rho mae rmse
## [1,] 1 0.9944769 29.19078 44.20362
## [2,] 2 0.9938551 29.75890 46.68020
## [3,] 3 0.9923781 33.38572 52.24799
## [4,] 4 0.9906213 36.93827 57.98292
## [5,] 5 0.9879685 41.90347 66.04074
## [6,] 6 0.9855116 46.67320 72.81851
## [7,] 7 0.9831397 51.34790 79.25593
## [8,] 8 0.9815603 55.13373 83.48991
## [9,] 9 0.9801222 58.14001 87.12388
## [10,] 10 0.9779157 62.46149 92.64490
simplex(popd_sf,"popDensity",lib = 1:2000,pred = 2001:nrow(popd_sf),k = 6,nb = popd_nb)
## The suggested embedding dimension E for variable popDensity is 6
## E rho mae rmse
## [1,] 1 0.8033220 717.2946 2390.767
## [2,] 2 0.8964891 599.2670 1826.370
## [3,] 3 0.8947098 578.9176 1820.573
## [4,] 4 0.8945790 576.8006 1807.979
## [5,] 5 0.8962965 570.3921 1807.007
## [6,] 6 0.9054365 550.2581 1746.493
## [7,] 7 0.8990063 575.2825 1797.437
## [8,] 8 0.8976972 555.6466 1812.048
## [9,] 9 0.8896512 558.8580 1861.250
## [10,] 10 0.8948435 561.8943 1825.444
We choose the E with the highest rho and the lowest MAE and RMSE as
the most suitable one. Under the selected lib and pred, the optimal
embedding dimension E for the variable Pre
is 1, and for
the variable popDensity
, it is 6.
Run GCCM:
startTime = Sys.time()
pd_res = gccm(data = popd_sf,
cause = "Pre",
effect = "popDensity",
libsizes = seq(10, 2800, by = 100),
E = c(1,6),
k = 6,
nb = popd_nb,
progressbar = FALSE)
endTime = Sys.time()
print(difftime(endTime,startTime, units ="mins"))
## Time difference of 22.27982 mins
pd_res
## libsizes Pre->popDensity popDensity->Pre
## 1 10 0.04645034 0.00805449
## 2 110 0.16223010 0.01726522
## 3 210 0.22318304 0.02271083
## 4 310 0.25949735 0.02743318
## 5 410 0.28995847 0.03212161
## 6 510 0.32172353 0.03552054
## 7 610 0.35660688 0.03725974
## 8 710 0.38969118 0.03924114
## 9 810 0.41668529 0.04093319
## 10 910 0.44073658 0.04265137
## 11 1010 0.46353489 0.04481396
## 12 1110 0.48302501 0.04703306
## 13 1210 0.50238564 0.04947260
## 14 1310 0.51956650 0.05171305
## 15 1410 0.53570755 0.05355272
## 16 1510 0.55293176 0.05402601
## 17 1610 0.57236430 0.05421010
## 18 1710 0.59176733 0.05418337
## 19 1810 0.61016875 0.05451655
## 20 1910 0.62840034 0.05493706
## 21 2010 0.64621018 0.05553755
## 22 2110 0.66319432 0.05615816
## 23 2210 0.67972718 0.05650395
## 24 2310 0.69575258 0.05662365
## 25 2410 0.71142528 0.05708219
## 26 2510 0.72697356 0.05760167
## 27 2610 0.74231147 0.05824259
## 28 2710 0.75484801 0.05904463
Visualize the result:
Load data and package:
npp = terra::rast(system.file("extdata/npp.tif", package = "spEDM"))
npp
## class : SpatRaster
## dimensions : 404, 483, 3 (nrow, ncol, nlyr)
## resolution : 10000, 10000 (x, y)
## extent : -2625763, 2204237, 1877078, 5917078 (xmin, xmax, ymin, ymax)
## coord. ref. : CGCS2000_Albers
## source : npp.tif
## names : npp, pre, tem
## min values : 164.00, 384.3409, -47.8194
## max values : 16606.33, 23878.3555, 263.6938
terra::plot(npp, nc = 3,
mar = rep(0.1,4),
oma = rep(0.1,4),
axes = FALSE,
legend = FALSE)
To save the computation time, we will aggregate the data by 3 times and select 3000 non-NA pixels to predict:
npp = terra::aggregate(npp, fact = 3, na.rm = TRUE)
terra::global(npp,"isNA")
## isNA
## npp 14815
## pre 14766
## tem 14766
terra::ncell(npp)
## [1] 21735
nnamat = terra::as.matrix(!is.na(npp[[1]]), wide = TRUE)
nnaindice = terra::rowColFromCell(npp,which(nnamat))
dim(nnaindice)
## [1] 6920 2
set.seed(42)
indices = sample(nrow(nnaindice), size = 3000, replace = FALSE)
lib = nnaindice[-indices,]
pred = nnaindice[indices,]
Due to the high number of NA values in the npp raster data, we used all non-NA cell indices when testing for the most suitable embedding dimension.
simplex(npp,"pre",nnaindice,nnaindice,k = 5)
## The suggested embedding dimension E for variable pre is 2
## E rho mae rmse
## [1,] 1 0.9986687 178.4244 249.0336
## [2,] 2 0.9990565 144.0794 209.5812
## [3,] 3 0.9989996 148.3566 216.1486
## [4,] 4 0.9989168 156.0074 225.1383
## [5,] 5 0.9988119 165.5576 235.6243
## [6,] 6 0.9986605 176.7865 250.1238
## [7,] 7 0.9986104 179.9232 254.6314
## [8,] 8 0.9985710 183.4329 258.2804
## [9,] 9 0.9985349 186.6469 261.5637
## [10,] 10 0.9984171 193.6204 272.0611
simplex(npp,"npp",nnaindice,nnaindice,k = 5)
## The suggested embedding dimension E for variable npp is 2
## E rho mae rmse
## [1,] 1 0.9642440 424.0534 635.5334
## [2,] 2 0.9684523 388.4106 595.6706
## [3,] 3 0.9655137 377.1112 624.6225
## [4,] 4 0.9671040 360.4011 609.6115
## [5,] 5 0.9666078 366.4018 613.7714
## [6,] 6 0.9638873 369.2677 639.2369
## [7,] 7 0.9668067 367.8782 610.0439
## [8,] 8 0.9664286 366.7623 613.4352
## [9,] 9 0.9660476 370.2265 616.7720
## [10,] 10 0.9650804 374.4953 625.3163
Under the selected lib and pred, the optimal embedding dimension E
for the variable pre
is 2, and for the variable
npp
, it is also 2.
Run GCCM:
startTime = Sys.time()
npp_res = gccm(data = npp,
cause = "pre",
effect = "npp",
libsizes = seq(10,130,5),
E = 2,
k = 5,
RowCol = pred,
progressbar = FALSE)
endTime = Sys.time()
print(difftime(endTime,startTime, units ="mins"))
## Time difference of 24.68367 mins
npp_res
## libsizes pre->npp npp->pre
## 1 10 0.06990912 0.06192514
## 2 15 0.08711065 0.07275609
## 3 20 0.11581999 0.08954475
## 4 25 0.13388646 0.11312545
## 5 30 0.15133527 0.12053396
## 6 35 0.16465535 0.12834855
## 7 40 0.18151920 0.13740629
## 8 45 0.20732869 0.15100178
## 9 50 0.23477955 0.17119613
## 10 55 0.25759086 0.20008374
## 11 60 0.26634194 0.22521937
## 12 65 0.25765429 0.24842751
## 13 70 0.24071790 0.25299784
## 14 75 0.22828746 0.23493541
## 15 80 0.22148585 0.21775972
## 16 85 0.22555346 0.17916316
## 17 90 0.23947560 0.17657659
## 18 95 0.25040161 0.18059851
## 19 100 0.28862493 0.15892383
## 20 105 0.32793539 0.14364838
Visualize the result:
plot(npp_res,xlimits = c(9, 101),ylimits = c(-0.05,1))
## Warning: Removed 1 row containing missing values or values outside the scale range (`geom_line()`).
## Removed 1 row containing missing values or values outside the scale range (`geom_line()`).