CpDyna

Description

The CpDyna package is dedicated to dynamic graphs and implements the algorithm described in (Corneli, Latouche, and Rossi 2018). In that paper, a model based approached is proposed in order to cluster the vertices of a dynamic graph, while detecting multiple change points in the interaction intensities. In this vignette, we show how tu use the package on simulated and real datasets.

Installation

set.seed(1)
library(CpDyna)

Datasets

The package contains 1 simulated dataset and 2 real datasets.

Application (simulated data)

Loading the simulated dataset:

data("Gnu")

A custom partition is created based on the time horizon in Gnu:

tail(Gnu[1,])
## [1] 99.43402 99.44803 99.73759 99.82499 99.87731 99.97434
custom_ptn <- c(1:100)

Call to the function ModFit:

res <- ModFit(Gnu, 
              4, 
              4, 
              MinimalPartition = FALSE, 
              N_initializations = 20, 
              custom_partition = custom_ptn)

Looking at the estimated clusters/change points:

res$est_z
##  [1] 3 3 3 3 3 4 4 4 4 4 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
## [36] 2 2 2 2 2
res$est_cp
## [1]  20  30  60 100

and plotting the results:

par(mfrow = c(1,4))
ModPlot(Gnu, res, type = "adjacency")
## Warning: package 'knitr' was built under R version 3.4.3

Application (real data)

Loading the real datasets:

data(wtab)
data(stations_df)

In order to take a look at the dynamics, we plot an histogram of the interaction dates:

par(mfrow=c(1,1), bty="n", cex.axis=1.5, cex.lab=1.5)
hist(wtab$Start.Date, breaks = 60, xaxt="n", yaxt="n", main = "", xlab = "time (hours)")
last <- 86340      # roughly 24h
time.seq<-seq(from=1, to=last, by=3600)  
axis(1, at=time.seq, lab=c(0:23), lwd=.2)
axis(2, lwd=.2)

Gnu is slightly manipulated to fit the ModFit function format:

Gnu <- as.matrix(wtab)
Gnu <- t(Gnu)

Finally, the ModFit function is called to perform node clustering and change point detection:

N <- max(Gnu[2,], Gnu[3,])   # number of nodes/stations

step <- 900      # in seconds, corresponds to 15 minutes
custom_ptn <- seq(from = step, to = max(Gnu[1,]), by = step)  # user defind partition

res <- ModFit(Gnu, 4, 4, eps=10^(-1),MinimalPartition = FALSE, custom_partition = custom_ptn, N_initializations = 1,  verbose = TRUE)

The estimated change points are added to the histogram:

cp <- res$est_cp
abline(v = cp, col="red", lwd=1.5)

Plotting the results on a map

library(mapview)
library(sp)


new_df <- stations_df[, c(1,2,4,5)]

#converting columns from factor to proper formats
sq<-c(1,3,4)
for(i in 1:length(sq)) new_df[,sq[i]]<-as.numeric(levels(new_df[,sq[i]]))[new_df[,sq[i]]]
new_df[,2]<-as.character(levels(new_df[,2]))[new_df[,2]]

WhoIsWho <- seq(1:N)

match_pos <- res$est_z[match(new_df$id, WhoIsWho)]  # for each station (id) in new_df, i look for its position in WhoIsWho and take outz at this position
new_df<-cbind(new_df, match_pos)
pos_na <- which(is.na(new_df$match_pos))
new_df <- new_df[-pos_na, ]



tav <-c ("RoyalBlue3","red", "green3","gold")
sub_df <- new_df[,c(3,4)]
coordinates(sub_df) <- ~long+lat
proj4string(sub_df) <- CRS("+init=epsg:4326")
mapview(sub_df, color=tav[new_df$match_pos], cex= 0.5, alpha = 0.8, lwd=6)

References

Corneli, Marco, Pierre Latouche, and Fabrice Rossi. 2018. “Multiple Change Points Detection and Clustering in Dynamic Networks.” Statistics and Computing 28 (5): 989–1007. doi:10.1007/s11222-017-9775-1.