The streamDAG package provides indices and tools for analyzing directed acyclic graph (DAG) representations of intermittent stream networks. The DAG framework allows a wide range of analytical approaches for intermittent streams including classic measures from hydrology, ecology, and of course, graph theory. A focus of many streamDAG algorithms is the measurement of 1) “local” arc (stream segment) and nodal (inter-arc) characteristics, and 2) network-level complexity and connectivity. While many of these approaches are purely topological, a non-trivial number of DAG indices, particularly weighted approaches, will provide outcomes identical to existing hydrological (non-graph-theoretic) measures for streams. These include Integral Connectivity Scale Length (ICSL) and its variants (Western et al. 2013) Perennial streams (and even non-stream networks) can be potentially analyzed with streamDAG algorithms. However, the major motivator for the package was the development of procedures that consider the spatio-temporal variability of intermittent stream networks. As a result most streamDAG algorithms assume that graphs are directed (from upstream to downstream). Thus, these functions may produce errors if directed graphs are used without checking function arguments.

The streamDAG package is built under the basic idiom of the igraph package (Csardi and Nepusz 2006), and most of its functions utilize igraph basis algorithms. Newest (developmental) versions of streamDAG can be obtained from the public GitHub repository: https://github.com/moondog1969/streamDAG. The package maintainer is Ken Aho (). An introduction to the streamDAG package can be found in (Aho et al. 2023).

Newest versions of the streamDAG package can be installed from the R console via GitHub, after installing the package devtools. In particular, use:

library(devtools)
install_github("moondog1969/streamDAG")

Stable versions of the package will also be housed on the Comprehensive R Archive Network (CRAN) beginning in September 2023. Following this inception (version 1.4-4), streamDAG can be installed directly from CRAN mirrors using:

install.packages("streamDAG")

After installing streamDAG, the package can be loaded into R conventionally:

library(streamDAG)
Welcome to streamDAG!

For more information on using the package type:

vignette("streamDAG")

Introductory Examples of Usage

Murphy Creek

We begin with an in-depth demonstration of the streamDAG package using Murphy Creek, a very simple intermittent stream in the Reynolds Creek experimental watershed in southwestern Idaho (Fig 1). From 6/3/2019 to 10/3/2019, stream presence data were acquired at 15 minute intervals from 25 Murphy Creek nodes, corresponding to 24 stream segment arcs. Bounding nodes were added to encompass the entire length of the network. This resulted in a final Murphy Creek network with 28 nodes and 27 arcs for analysis.

The Reynolds Cr. experimental watershed in SW Idaho.

Figure 1: The Reynolds Cr. experimental watershed in SW Idaho.

Purely topological analyses can be conducted in streamDAG using only an igraph codified stream network. Below is a codification of Murphy Creek based on nodes established by Warix et al. (2021). The code IN_N --+ M1984 indicates that the stream flows from node IN_N to node M1984, and so on.

murphy_spring <- graph_from_literal(IN_N --+ M1984 --+ M1909, IN_S --+ M1993,
M1993 --+ M1951 --+ M1909 --+ M1799 --+ M1719 --+ M1653 --+ M1572 --+ M1452,
M1452 --+ M1377 --+ M1254 --+ M1166 --+ M1121 --+ M1036 --+ M918 --+ M823,
M823 --+ M759 --+ M716 --+ M624 --+ M523 --+ M454 --+ M380 --+ M233 --+ M153,
M153 --+ M91 --+ OUT)

This code is contained as an option in the function streamDAGs which also codifies other intermittent stream igraph objects.

streamDAGs("mur_full")
IGRAPH dba9e0e DN-- 28 27 -- 
+ attr: name (v/c)
+ edges from dba9e0e (vertex names):
 [1] IN_N ->M1984 M1984->M1909 M1909->M1799 IN_S ->M1993 M1993->M1951
 [6] M1951->M1909 M1799->M1719 M1719->M1653 M1653->M1572 M1572->M1452
[11] M1452->M1377 M1377->M1254 M1254->M1166 M1166->M1121 M1121->M1036
[16] M1036->M918  M918 ->M823  M823 ->M759  M759 ->M716  M716 ->M624 
[21] M624 ->M523  M523 ->M454  M454 ->M380  M380 ->M233  M233 ->M153 
[26] M153 ->M91   M91  ->OUT  

Much more flexibility in streamDAG functions is possible by defining stream spatial coordinates and graph weighting data, including stream lengths, nutrient loading, and information about stream segment presence (wet) or absence (dry).

The streamDAG package contains additional Murphy Creek data, including nodal spatial coordinates (UTMs), stream arc (segment) lengths, and stream arc presence absence data. Instream lengths and distances can be obtained from a number of sources including ARC-GIS. Stream presence can be ascertained using a number of methods, including conductivity and temperature sensors.

data(mur_coords) # Node spatial coords
data(mur_lengths) # Arc (stream segment) lengths
data(mur_node_pres_abs) # Node presence / absence data with explicit datetimes
data(mur_arc_pres_abs) # Arc (stream segment) simulated presence / absence data

Care should be taken to ensure that the order of relevant rows and columns and elements in ancillary data correspond to the order of nodes and arcs defined in the underlying graph, G with the functions igraph::V (which gives nodes) and A or igraph::E (which give arcs).

Within ancillary datasets, different code identifiers can be used to designate arcs. For instance, for an arc \(z = \vec{uv}\) where \(u\) is the tail of arc \(z\) and \(v\) is the head of \(z\), we could code: u--+v or u-->v or u --> v or u->v, or even u|v. The important thing is that the ordering is consistent with the arcs in the corresponding graph object. For instance, here are the first six arc names for the graph object murphy_spring.

head(A(murphy_spring))
+ 6/27 edges from dba4d73 (vertex names):
[1] IN_N ->M1984 M1984->M1909 M1909->M1799 IN_S ->M1993 M1993->M1951
[6] M1951->M1909

Note that these correspond to the identifiers for the first six stream lengths (in the first six rows) from mur_lengths.

head(mur_lengths)
            Arcs Lengths
1  IN_N -> M1984   20.30
2 M1984 -> M1909   75.00
3 M1909 -> M1799  108.99
4  IN_S -> M1993   68.30
5 M1993 -> M1951   27.60
6 M1951 -> M1909   14.40

Naming of nodes should be consistent with the node names in the corresponding graph object. For instance, here are the first six graph node names from murphy_spring.

head(V(murphy_spring))
+ 6/28 vertices, named, from dba4d73:
[1] IN_N  M1984 M1909 IN_S  M1993 M1951

The naming (and order) correspond to the first six identifiers (column names in this case) for presence absence data from mur_node_pres_abs.

names(mur_node_pres_abs)[1:7][-1] # ignoring datestamp column 1 
[1] "IN_N"  "M1984" "M1909" "IN_S"  "M1993" "M1951"

Spatial Plots

It is easy to depict a spatially explicit stream DAG using the streamDAG function spatial.plot. We can make a spatial plot by augmenting graph data with nodal spatial coordinates (Fig 2).

x <- mur_coords[,2]; y <- mur_coords[,3]
names = mur_coords[,1]
spatial.plot(murphy_spring, x, y, names, cex.text = .7)
Spatially explicit graph of the completely wetted Murphy Cr. network, as it occurs in the spring.

Figure 2: Spatially explicit graph of the completely wetted Murphy Cr. network, as it occurs in the spring.

ARC-GIS shapefiles can also be used to generate spatial plots with the function spatial.plot.sf. Use of shapefiles requires use of the libraries libraries ggplot2 and sf, and resulting graphs can be customized using ggplot2 modifiers (Fig 3). Use of shapefiles will eliminate some of the easy to easy-to-use features in spatial.plot including directional arrows indicating flow and the automated deletion of arcs and nodes with presence / absence data (see Section 2.3).

library(ggplot2); library(sf); library(ggrepel)
# Note that the directory "shape" also contains required ARC-GIS .shx,.cpg, and .prj files.
mur_sf <- st_read(system.file("shape/Murphy_Creek.shp", package="streamDAG"))
Reading layer `Murphy_Creek' from data source 
  `C:\Users\ahoken\AppData\Local\Temp\Rtmpu8HPEX\Rinst45d826d0611\streamDAG\shape\Murphy_Creek.shp' 
  using driver `ESRI Shapefile'
Simple feature collection with 2 features and 2 fields
Geometry type: LINESTRING
Dimension:     XY
Bounding box:  xmin: 512864.7 ymin: 4788962 xmax: 514722.6 ymax: 4789265
Projected CRS: NAD83 / UTM zone 11N
g1 <- spatial.plot.sf(x, y, names, shapefile = mur_sf)

## some ggplot customizations
g1 + expand_limits(y = c(4788562,4789700)) +
  theme(plot.margin = margin(t = 0, r = 10, b = 0, l = 0)) +
  geom_text_repel(data = mur_coords, aes(x = x, y = y, label = Object.ID), colour = "black", 
                size = 1.6, box.padding = unit(0.3, "lines"), point.padding = 
                  unit(0.25, "lines"))
Example of using a shapefile with `spatial.plot.sf`.

Figure 3: Example of using a shapefile with spatial.plot.sf.

Tracking Intermittency

The activity of stream nodes and/or arcs (segments) can be easily tracked in stream graphs based on STIC or conductivity data using the streamDAG functions delete.arcs.pa and delete.nodes.pa.

For instance, the dataset mur_node_pres_abs contains a subset of nodal presence  absence data for Murphy Creek in 2019. Below we see rows for time series observations 650 to 655.

mur_node_pres_abs[650:655,]
            Datetime IN_N M1984 M1909 IN_S M1993 M1951 M1799 M1719 M1653 M1572
6491  8/9/2019 22:30    0     0     0    0     0     0     1     0     0     1
6501  8/10/2019 1:00    0     0     0    0     0     0     1     0     0     1
6511  8/10/2019 3:30    0     0     0    0     0     0     1     0     0     1
6521  8/10/2019 6:00    0     0     0    0     0     0     1     0     0     1
6531  8/10/2019 8:30    0     0     0    0     0     0     1     0     0     1
6541 8/10/2019 11:00    0     0     0    0     0     0     1     0     0     1
     M1452 M1377 M1254 M1166 M1121 M1036 M918 M823 M759 M716 M624 M523 M454
6491     0     0     1     0     0     1    1    0    0    1    1    1    0
6501     0     0     1     0     0     1    1    0    0    1    1    1    0
6511     0     0     1     1     0     1    1    0    0    1    1    1    0
6521     0     0     1     1     0     1    1    0    0    1    1    1    0
6531     0     0     1     1     0     1    1    0    0    1    1    1    1
6541     0     0     1     1     0     1    1    0    0    1    1    1    1
     M380 M233 M153 M91 OUT
6491    0    1    1   1   1
6501    0    1    1   1   1
6511    0    1    1   1   1
6521    0    1    1   1   1
6531    0    1    1   1   1
6541    0    1    1   1   1

Modifying murphy_spring based on the nodal observations at 8/9/2019 22:30 we have:

npa <- mur_node_pres_abs[650,][,-1]
G1 <- delete.nodes.pa(murphy_spring, npa)

The resulting spatial plot is shown as Fig 4. Note that nodes without water are now omitted from the graph. Arcs missing one or more bounding nodes are also omitted.

spatial.plot(G1, x, y, names, cex.text = .7)
Plotting a modification of `murphy_spring` after application of `delete.nodes.pa`.

Figure 4: Plotting a modification of murphy_spring after application of delete.nodes.pa.

There are several graphical approaches for distinguishing “dry” and “wet” stream locations. The simplest is to simply show “wet” nodes and arcs bounded by “wet” nodes as in Fig 4. One can also show “dry” node locations by specifying show.dry = TRUE (Fig 5).

spatial.plot(G1, x, y, names, plot.dry = TRUE, cex.text = .7)
Dry nodes overlaid on `murphy_spring`.

Figure 5: Dry nodes overlaid on murphy_spring.

Finally, one can show “wet” nodes and associated arcs superimposed over the entire network, which includes, potentially, “dry” nodes and arcs. This approach requires generation of spatial.plot object representing the entire network, and specification of this object using the argument cnw i.e., complete network (Fig 6).

spc <- spatial.plot(murphy_spring, x, y, names, plot = FALSE)
spatial.plot(G1, x, y, names, plot.dry = TRUE, cex.text = .7, cnw = spc)
Dry portions of the network underlying wet nodes and associated arcs.

Figure 6: Dry portions of the network underlying wet nodes and associated arcs.

One can also modify graphs based on arc presence / absence data. The dataframe mur_arc_pres_abs contains simulated multivariate Bernoulli datasets for Murphy Cr. arcs based on 2019 nodal data.

head(mur_arc_pres_abs) # 1st 6 rows of data
  IN_N-->M1984 M1984-->M1909 M1909-->M1799 IN_S-->M1993 M1993-->M1951
1            1             0             1            0             0
2            1             0             1            0             1
3            1             0             1            0             0
4            0             1             1            1             0
5            0             0             1            0             0
6            0             1             0            0             0
  M1951-->M1909 M1799-->M1719 M1719-->M1653 M1653-->M1572 M1572-->M1452
1             0             1             0             1             1
2             0             0             1             1             1
3             1             0             1             0             0
4             1             1             1             1             1
5             0             0             0             1             1
6             1             0             0             1             0
  M1452-->M1377 M1377-->M1254 M1254-->M1166 M1166-->M1121 M1121-->M1036
1             0             1             1             0             0
2             0             1             0             1             1
3             1             1             1             0             0
4             1             1             0             1             1
5             0             1             0             1             1
6             1             1             1             0             0
  M1036-->M918 M918-->M823 M823-->M759 M759-->M716 M716-->M624 M624-->M523
1            1           0           0           1           1           1
2            1           1           1           1           1           1
3            0           1           0           1           1           1
4            1           1           1           1           1           0
5            1           0           0           1           1           1
6            1           0           0           0           1           1
  M523-->M454 M454-->M380 M380-->M233 M233-->M153 M153-->M91 M91-->OUT
1           1           0           0           0          1         1
2           1           1           1           0          1         1
3           1           1           0           1          1         1
4           1           1           1           1          1         1
5           1           1           1           1          1         1
6           1           1           0           1          1         1

Modifying murphy_spring arcs based on the 6th simulated multivariate Bernoulli dataset of arc presence / absence, we have:

G2 <- delete.arcs.pa(murphy_spring, mur_arc_pres_abs[6,])

The resulting spatial plot is shown in Fig 7. Note that all nodes are plotted, but plotted arcs are limited to those with recorded stream activity.

spatial.plot(G2, x, y, names, cex.text = .7)
Plotting a modified version of `murphy_spring` after application of `delete.arcs.pa`.

Figure 7: Plotting a modified version of murphy_spring after application of delete.arcs.pa.

Unweighted (Purely Topological) Measures for Stream DAGs

There are many measures useful for describing and distinguishing intermittent stream networks that are based solely on graph topological features (i.e., the presence or absence of nodes and adjoining arcs). These can be separated into local measures that describe the characteristics of individual stream nodes or arcs, and global measures that summarize the characteristics of an entire network, i.e., the entire graph.

Local measures

A number of local measures are included in the streamDAG function local.summary. The function only requires an igraph graph object.

local <- local.summary(murphy_spring)
round(local, 2)[,1:9]
                        IN_N M1984  M1909 IN_S M1993 M1951  M1799  M1719  M1653
alpha.cent              1.00  2.00   6.00 1.00  2.00  3.00   7.00   8.00   9.00
page.rank               0.01  0.01   0.03 0.01  0.01  0.02   0.03   0.04   0.04
imp.closeness.cent      0.00 27.00  90.00 0.00 27.00 40.50  78.75  77.40  78.30
betweenness.cent        0.00 23.00 110.00 0.00 24.00 46.00 126.00 140.00 152.00
n.nodes.in.paths        1.00  1.00   5.00 1.00  1.00  2.00   6.00   7.00   8.00
n.paths                 0.00  1.00   5.00 0.00  1.00  2.00   6.00   7.00   8.00
upstream.network.length 0.00  1.00   5.00 0.00  1.00  2.00   6.00   7.00   8.00
path.length.mean        0.00  1.00   1.80 0.00  1.00  1.50   2.50   3.14   3.75
path.length.var          NaN  0.00   0.56  NaN  0.00  0.25   0.92   1.55   2.44
path.length.skew          NA    NA   0.51   NA    NA   NaN   0.00  -0.35  -0.46
path.length.kurt          NA    NA  -0.61   NA    NA   NaN  -0.25  -0.30  -0.60
path.degree.mean          NA  0.00   1.20   NA  0.00  1.00   1.67   1.71   1.75
path.degree.var          NaN  0.00   0.16  NaN  0.00  0.00   0.22   0.49   0.44
path.degree.skew          NA    NA   2.24   NA    NA   NaN  -0.97   0.60   0.40
path.degree.kurt          NA    NA   5.00   NA    NA   NaN  -1.87  -0.35  -0.23
in.eccentricity         0.00  1.00   3.00 0.00  1.00  2.00   4.00   5.00   6.00
mean.efficiency         0.00  0.04   0.12 0.00  0.04  0.06   0.11   0.11   0.11

A graphical summary based only on measures with complete cases and standardized outcomes is shown in Fig 8. Nodes along the x-axis are sorted based on their order in the murphy_spring igraph object, which roughly corresponds to their order from sources to sink. In general, nodes increase in information and importance as distance to the sink decreases. Note, however, the “unusual” importance of M1909 due to its location at a confluence (Fig 2).

cc <- complete.cases(local)
local.cc <- local[cc,]
s.local <- t(scale(t(local.cc)))
ss.local <- stack(data.frame(s.local))
ss.local$metrics <- rep(row.names(local.cc), 28)
# theme used throughout vignette
theme_facet <- function(){
  theme(strip.background = element_blank(),
        strip.text.x = element_text(size = 10),
        axis.title.y = element_text(margin = margin(r = .2, unit = "in"), size = 11.5), 
        axis.title.x = element_text(margin = margin(r = .2, unit = "in"), size = 11.5),
        panel.background = element_rect(fill = "white", colour = "black",
                                        linewidth = 0.5),
        legend.position="none",
        panel.grid.major = element_blank(),
        panel.spacing = unit(0.2, "lines"),
        strip.placement = "inside",
        axis.ticks = element_line(colour = "black", linewidth = .2),
        panel.grid.minor = element_blank())
}

ggplot(ss.local, aes(x = ind, y = values)) +
   geom_bar(stat = "identity") +
   facet_wrap(~metrics) +
   theme_facet() + 
   theme(axis.text.x = element_text(angle = 90, vjust = 0.5, size=4.3)) +
   ylab("Standardized local measures") +
   xlab("\nNode")
Local graph-theoretic summaries for `murphy_spring`

Figure 8: Local graph-theoretic summaries for murphy_spring

Horizontal visibility graphs

A less frequently used, but potentially important tool for measuring nodal importance is the horizontal visibility graph (Luque et al. 2009). Two nodes will be visible from each other if, when node data (e.g., degrees) are plotted as horizontal bars along the abscissa axis, and placed along the ordinate based on their location in the stream path, the bars can be connected with a horizontal line (Luque et al. 2009). Note that the importance of M1909 in a visibility analysis based on indegree (Fig 9). Weighted (see below) node visibilities can also be obtained with multi.path.visibility.

vis <- multi.path.visibility(murphy_spring, source = c("IN_N","IN_S"),
sink = "OUT", autoprint = F)

barplot(vis$visibility.summary, las = 2, cex.names = .6, ylab = "Visible nodes",
        legend.text = c("Downstream", "Upstream", "Both"),
        args.legend = list(x = "topright", title = "Direction"))