Global Sensitivity Analysis has the goal of breaking a mathematical
model’s tough rind, allowing us to look inside. Let’s assume that we are
modeling a problem where two quantities of interest, \(\mathbf{Y} = \{Y_1, Y_2\}\), are defined as
a function of three variables, \(\mathbf{X} =
(X_1, X_2, X_3)\). We express this dependence through a linear
transformation \(\mathbf{Y} =
A\mathbf{X}\), where \[
A = \begin{bmatrix}
4 & 2 & 3 \\
2 & 5 & -1
\end{bmatrix}.
\] We assume that the inputs are uncertain and that this
uncertainty can be modeled using a multivariate normal distribution,
\(\mathbf{X} \sim \mathcal{N}(\mathbf{\mu},
\Sigma)\), with \(\mu =
(1,1,1)\) and \[
\Sigma = \begin{bmatrix}
1 & 0.5 & 0.5 \\
0.5 & 1 & 0.5 \\
0.5 & 0.5 & 1
\end{bmatrix}.
\] To perform an uncertainty quantification of the model, under
the assumed distribution for the inputs, we can use a simple Monte Carlo
simulation. We generate the input sample (x
), and we
estimate the model output (y
) for each input. We can define
the model and simulate it in R using the following code block.
# Define the input distribution parameters
mx <- c(1, 1, 1)
Sigmax <- matrix(data = c(1, 0.5, 0.5, 0.5, 1, 0.5, 0.5, 0.5, 1), nrow = 3)
# Define the number of samples
N <- 1000
# Set the random number generator seed for reproducibility
set.seed(777)
# Sample from standard normals
x1 <- rnorm(N)
x2 <- rnorm(N)
x3 <- rnorm(N)
# Transform the standard normals into the required distribution
x <- cbind(x1, x2, x3)
x <- mx + x %*% chol(Sigmax)
# Define the model (matrix with coefficients)
A <- matrix(data = c(4, -2, 1, 2, 5, -1), nrow = 2, byrow = TRUE)
# Generate the output
y <- t(A %*% t(x))
colnames(y) <- c("y1", "y2")
We can plot the distribution of the output using the
ggplot2
package.
library(ggplot2)
library(patchwork)
# Set the theme
theme_set(theme_minimal())
# Prepare the marginal and the 2D density plots
p1 <- ggplot(as.data.frame(y), aes(x = y1, y = y2)) +
geom_point(color = "black") +
geom_density_2d()
p2 <- ggplot(as.data.frame(y), aes(x = y1)) +
geom_density()
p3 <- ggplot(as.data.frame(y), aes(x = y2)) +
geom_density() +
coord_flip()
p2 + plot_spacer() + p1 + p3 +
plot_layout(ncol = 2, nrow = 2, widths = c(4, 1), heights = c(1, 4), axes = "collect")
Given this uncertainty, we are interested in knowing which are the
most important inputs in the model. This is the goal the package
gsaot
has been designed for. With this package, we can
compute different indices to evaluate the importance of the inputs on
different statistical properties of the output. Let’s first compute the
Wasserstein-Bures sensitivity indices. Since the inputs and outputs are
gaussians, these indices are the actual solution of the Optimal
Transport problem . However, this is not usually the case. The only
hyperparameter needed in this case is the number of partitions for the
input data, M
. We set this value to 25 since we only have
1000 data points.
library(gsaot)
M <- 25
indices_wb <- ot_indices_wb(x, y, M)
indices_wb
#> Method: wasserstein-bures
#>
#> Indices:
#> V1 V2 V3
#> 0.4951390 0.5072342 0.1387523
#>
#> Advective component:
#> V1 V2 V3
#> 0.3030834 0.3214758 0.1209107
#>
#> Diffusive component:
#> V1 V2 V3
#> 0.1920556 0.1857584 0.0178415
#>
#> Upper bound: 94.73206
We can also use the package functions to create the plots for the indices.
To estimate the uncertainty of the indices, we can use bootstrapping.
This is implemented in the function ot_indices_wb
through
the option boot = TRUE
. In this case, we also have to set
the number of replicas (R
).