Visualising simulated data

This vignette gives an overview of the ways to plot the line list and contacts data output from the sim_linelist() and sim_outbreak() functions.

Plotting can be useful to identify certain transmission dynamics and patterns in the simulated data, or just to check that the simulated data looks as expected given how the simulation was parameterised.

library(simulist)
library(epiparameter)
library(incidence2)
library(epicontacts)

First we load the required delay distributions using the {epiparameter} package.

contact_distribution <- epiparameter(
  disease = "COVID-19",
  epi_name = "contact distribution",
  prob_distribution = create_prob_distribution(
    prob_distribution = "pois",
    prob_distribution_params = c(mean = 3)
  )
)
#> Citation cannot be created as author, year, journal or title is missing

infectious_period <- epiparameter(
  disease = "COVID-19",
  epi_name = "infectious period",
  prob_distribution = create_prob_distribution(
    prob_distribution = "gamma",
    prob_distribution_params = c(shape = 3, scale = 2)
  )
)
#> Citation cannot be created as author, year, journal or title is missing

# get onset to hospital admission from {epiparameter} database
onset_to_hosp <- epiparameter_db(
  disease = "COVID-19",
  epi_name = "onset to hospitalisation",
  single_epiparameter = TRUE
)
#> Using Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.. 
#> To retrieve the citation use the 'get_citation' function

# get onset to death from {epiparameter} database
onset_to_death <- epiparameter_db(
  disease = "COVID-19",
  epi_name = "onset to death",
  single_epiparameter = TRUE
)
#> Using Linton N, Kobayashi T, Yang Y, Hayashi K, Akhmetzhanov A, Jung S, Yuan
#> B, Kinoshita R, Nishiura H (2020). "Incubation Period and Other
#> Epidemiological Characteristics of 2019 Novel Coronavirus Infections
#> with Right Truncation: A Statistical Analysis of Publicly Available
#> Case Data." _Journal of Clinical Medicine_. doi:10.3390/jcm9020538
#> <https://doi.org/10.3390/jcm9020538>.. 
#> To retrieve the citation use the 'get_citation' function

Setting the seed ensures we have the same output each time the vignette is rendered. When using {simulist}, setting the seed is not required unless you need to simulate the same line list multiple times.

set.seed(123)

Using a simple line list simulation with the factory default settings:

linelist <- sim_linelist(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.33,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death,
  outbreak_size = c(500, 1e4)
)

This line list contains 2066 cases.

Visualising incidence of onset, hospitalisation and death

This section of the vignette is heavily based upon examples given in the Get Started vignette in the {incidence2} package. It is highly recommended to read the documentation supplied in the {incidence2} package to explore the full range of functionality.

To visualise the number of cases with onset on a particular day, the {incidence2} package, and its dedicated class (<incidence2>) are used for handling and plotting this data.

Currently {simulist} outputs dates that are not rounded to the nearest day, i.e. it can be half way through a day. This is not obvious as R prints dates to the nearest day by default, and only by removing the date class (using unclass()) can you see the decimals (as R stores dates internally as the number of days since 1970-01-01).

Note storing dates as precise doubles and not as integer days may change in the near future.

The interval = "daily" is required as {incidence2} requires rounded dates to aggregate cases per unit time and specifying the interval will do this automatically for us.

# create incidence object
daily <- incidence(x = linelist, date_index = "date_onset", interval = "daily")

It is possible that not every date had the onset of symptoms, resulting in some dates missing entries. This is taken care of by the complete_dates() function in {incidence2}.

# impute for days without cases
daily <- complete_dates(daily)
plot(daily)
Daily incidence of cases from symptom onset including days with zero cases.
Daily incidence of cases from symptom onset including days with zero cases.

Alternatively, incidence can be plotting weekly:

weekly <- incidence(linelist, date_index = "date_onset", interval = "isoweek")
plot(weekly)
Weekly incidence of cases from symptom onset
Weekly incidence of cases from symptom onset

In order to check differences between a group in the line list data, for example sex, the <incidence2> data object can be recreated, specifying which columns to group by.

weekly <- incidence(
  linelist,
  date_index = "date_onset",
  interval = "isoweek",
  groups = "sex"
)
plot(weekly)
Weekly incidence of cases from symptom onset facetted by sex
Weekly incidence of cases from symptom onset facetted by sex

To visualise the onset, hospitalisation and death incidence in the same plot they can be jointly specified to the date_index argument of incidence2::incidence().

First the outcome data needs to be pivoted from long data to wide data to be input into incidence2::incidence().

Reshape line list data

Base R

# this can also be achieved with the reshape() function but the user interface
# for that function is complicated so here we just create the columns manually
linelist$date_death <- linelist$date_outcome
linelist$date_death[linelist$outcome == "recovered"] <- NA
linelist$date_recovery <- linelist$date_outcome
linelist$date_recovery[linelist$outcome == "died"] <- NA

Tidyverse

library(tidyr)
library(dplyr)
linelist <- linelist %>%
  tidyr::pivot_wider(
    names_from = outcome,
    values_from = date_outcome
  )
linelist <- linelist %>%
  dplyr::rename(
    date_death = died,
    date_recovery = recovered
  )

daily <- incidence(
  linelist,
  date_index = c(
    onset = "date_onset",
    hospitalisation = "date_admission",
    death = "date_death"
  ),
  interval = "daily",
  groups = "sex"
)
daily <- complete_dates(daily)
plot(daily)
Daily incidence of cases from symptom onset and incidence of hospitalisations and deaths.
Daily incidence of cases from symptom onset and incidence of hospitalisations and deaths.

Demographic data

Please see the Age structured population vignette for examples of how to plot the distribution of ages within a line list data set, including age pyramids.

The plotting code in vignettes is hidden by default, click the Code button with arrow to reveal the plotting code.

Visualising contact data

This section of the vignette is based upon examples from the {epicontacts} R package documentation and the examples provided in the The Epidemiological R Handbook chapter on transmission chains. We recommend going to the documentation of the {epicontacts} R package to see the all plotting and data wrangling functionality.

Just as we utilised the <incidence2> class from the {incidence2} package to handle and plot incidence data, we are going to use the <epicontacts> class from the {epicontacts} R package to handle and plot epidemiological contact data.

The benefit of using {epicontacts} is the same as {incidence2}, in the fact that a default plotting method is supplied by the package.

Advanced

Additionally, {epicontacts} provides access to network plotting from JavaScript libraries via the {visNetwork} and {threejs} R packages.

The {epicontacts} function make_epicontacts() requires both the line list and contacts table, so we will run the sim_outbreak() function to produce both. We will use the same epidemiological delay distributions that we used to simulate a line list above, but reduce the mean number of contacts in the contact distribution to 2.

contact_distribution <- epiparameter(
  disease = "COVID-19",
  epi_name = "contact distribution",
  prob_distribution = create_prob_distribution(
    prob_distribution = "pois",
    prob_distribution_params = c(mean = 2)
  )
)
#> Citation cannot be created as author, year, journal or title is missing
set.seed(1)
outbreak <- sim_outbreak(
  contact_distribution = contact_distribution,
  infectious_period = infectious_period,
  prob_infection = 0.5,
  onset_to_hosp = onset_to_hosp,
  onset_to_death = onset_to_death
)
head(outbreak$linelist)
#>   id           case_name case_type sex age date_onset date_admission   outcome
#> 1  1    Munsif al-Demian suspected   m  35 2023-01-01           <NA> recovered
#> 2  2       Zachary Nault confirmed   m  43 2023-01-01     2023-01-08 recovered
#> 3  3      Tae Woo Ralpho confirmed   m   1 2023-01-02           <NA> recovered
#> 4  5 Abdul Kader el-Diab suspected   m  78 2023-01-05     2023-01-06      died
#> 5  6      Katelyn Catlin confirmed   f  22 2023-01-02     2023-01-29      died
#> 6  8        Lynsey Duron confirmed   f  28 2023-01-04           <NA> recovered
#>   date_outcome date_first_contact date_last_contact ct_value
#> 1         <NA>               <NA>              <NA>       NA
#> 2         <NA>         2022-12-30        2023-01-05     25.3
#> 3         <NA>         2022-12-30        2023-01-02     25.8
#> 4   2023-01-25         2022-12-29        2023-01-02       NA
#> 5   2023-01-11         2023-01-02        2023-01-04     24.9
#> 6         <NA>         2023-01-04        2023-01-05     24.5
head(outbreak$contacts)
#>               from                     to age sex date_first_contact
#> 1 Munsif al-Demian          Zachary Nault  43   m         2022-12-30
#> 2 Munsif al-Demian         Tae Woo Ralpho   1   m         2022-12-30
#> 3    Zachary Nault Carisa Flores-Gonzalez  29   f         2022-12-27
#> 4    Zachary Nault    Abdul Kader el-Diab  78   m         2022-12-29
#> 5   Tae Woo Ralpho         Katelyn Catlin  22   f         2023-01-02
#> 6   Tae Woo Ralpho       Azzaam el-Vaziri  70   m         2022-12-31
#>   date_last_contact was_case         status
#> 1        2023-01-05        Y           case
#> 2        2023-01-02        Y           case
#> 3        2023-01-03        N under_followup
#> 4        2023-01-02        Y           case
#> 5        2023-01-04        Y           case
#> 6        2023-01-03        N under_followup

Using the line list and contacts data simulated we can create the <epicontacts> object.

epicontacts <- make_epicontacts(
  linelist = outbreak$linelist,
  contacts = outbreak$contacts,
  id = "case_name",
  from = "from",
  to = "to",
  directed = TRUE
)

The <epicontacts> object comes with a custom printing feature to see the data.

epicontacts
#> 
#> /// Epidemiological Contacts //
#> 
#>   // class: epicontacts
#>   // 12 cases in linelist; 21 contacts;  directed 
#> 
#>   // linelist
#> 
#> # A tibble: 12 × 12
#>    id               id.1 case_type sex     age date_onset date_admission outcome
#>    <chr>           <int> <chr>     <chr> <int> <date>     <date>         <chr>  
#>  1 Munsif al-Demi…     1 suspected m        35 2023-01-01 NA             recove…
#>  2 Zachary Nault       2 confirmed m        43 2023-01-01 2023-01-08     recove…
#>  3 Tae Woo Ralpho      3 confirmed m         1 2023-01-02 NA             recove…
#>  4 Abdul Kader el…     5 suspected m        78 2023-01-05 2023-01-06     died   
#>  5 Katelyn Catlin      6 confirmed f        22 2023-01-02 2023-01-29     died   
#>  6 Lynsey Duron        8 confirmed f        28 2023-01-04 NA             recove…
#>  7 Vladimir Sanch…    11 confirmed m        46 2023-01-04 NA             recove…
#>  8 Jacy Cousins       12 confirmed f        67 2023-01-06 NA             recove…
#>  9 Anushkaran Ken…    13 confirmed m        86 2023-01-07 NA             recove…
#> 10 Katja Muetz        19 probable  f        56 2023-01-09 NA             recove…
#> 11 Benito Redding     21 suspected m        50 2023-01-11 2023-01-13     died   
#> 12 Erin Payson        22 confirmed f         7 2023-01-11 NA             recove…
#> # ℹ 4 more variables: date_outcome <date>, date_first_contact <date>,
#> #   date_last_contact <date>, ct_value <dbl>
#> 
#>   // contacts
#> 
#> # A tibble: 21 × 8
#>    from   to      age sex   date_first_contact date_last_contact was_case status
#>    <chr>  <chr> <int> <chr> <date>             <date>            <chr>    <chr> 
#>  1 Munsi… Zach…    43 m     2022-12-30         2023-01-05        Y        case  
#>  2 Munsi… Tae …     1 m     2022-12-30         2023-01-02        Y        case  
#>  3 Zacha… Cari…    29 f     2022-12-27         2023-01-03        N        under…
#>  4 Zacha… Abdu…    78 m     2022-12-29         2023-01-02        Y        case  
#>  5 Tae W… Kate…    22 f     2023-01-02         2023-01-04        Y        case  
#>  6 Tae W… Azza…    70 m     2022-12-31         2023-01-03        N        under…
#>  7 Tae W… Lyns…    28 f     2023-01-04         2023-01-05        Y        case  
#>  8 Abdul… Amaa…    37 f     2023-01-10         2023-01-10        N        under…
#>  9 Katel… Lili…    61 f     2023-01-02         2023-01-06        N        under…
#> 10 Lynse… Vlad…    46 m     2023-01-07         2023-01-08        Y        case  
#> # ℹ 11 more rows

To plot the contact network we can use the plotting method that is supplied by {epicontacts} and will be automatically recognised if the {epicontacts} package is loaded (as done above with library(epicontacts)).

If you are viewing this vignette on the web (or on a web browser) the graph below is interactive and will allow you to highlight individuals in the network using the drop-down menu, to zoom in and out of the plot by scrolling, and to move the network using the mouse to drag and drop.

plot(epicontacts)

Contact network from infectious disease outbreak. This includes all contacts, i.e. individuals that were infected and not infected

There is also the option to plot the contacts network in 3D using the epicontacts::graph3D().

By default the outbreak simulated by sim_outbreak() contains contacts of cases that were not infected. These are shown in the previous network plot by terminal nodes that do not pass on infection to other individuals (note that terminal nodes can also be infected individuals that did not infect anybody else, either due to not have any contacts or due to the probabilistic nature of infection transmission). Here we show how to subset the contacts table in order to only plot the transmission network of cases from the outbreak.

Subset contact network to transmission network

Base R

outbreak$contacts <- outbreak$contacts[outbreak$contacts$was_case == "Y", ]

Tidyverse

library(dplyr)
outbreak$contacts <- outbreak$contacts %>%
  dplyr::filter(was_case == "Y")

head(outbreak$linelist)
#>   id           case_name case_type sex age date_onset date_admission   outcome
#> 1  1    Munsif al-Demian suspected   m  35 2023-01-01           <NA> recovered
#> 2  2       Zachary Nault confirmed   m  43 2023-01-01     2023-01-08 recovered
#> 3  3      Tae Woo Ralpho confirmed   m   1 2023-01-02           <NA> recovered
#> 4  5 Abdul Kader el-Diab suspected   m  78 2023-01-05     2023-01-06      died
#> 5  6      Katelyn Catlin confirmed   f  22 2023-01-02     2023-01-29      died
#> 6  8        Lynsey Duron confirmed   f  28 2023-01-04           <NA> recovered
#>   date_outcome date_first_contact date_last_contact ct_value
#> 1         <NA>               <NA>              <NA>       NA
#> 2         <NA>         2022-12-30        2023-01-05     25.3
#> 3         <NA>         2022-12-30        2023-01-02     25.8
#> 4   2023-01-25         2022-12-29        2023-01-02       NA
#> 5   2023-01-11         2023-01-02        2023-01-04     24.9
#> 6         <NA>         2023-01-04        2023-01-05     24.5
head(outbreak$contacts)
#>               from                  to age sex date_first_contact
#> 1 Munsif al-Demian       Zachary Nault  43   m         2022-12-30
#> 2 Munsif al-Demian      Tae Woo Ralpho   1   m         2022-12-30
#> 3    Zachary Nault Abdul Kader el-Diab  78   m         2022-12-29
#> 4   Tae Woo Ralpho      Katelyn Catlin  22   f         2023-01-02
#> 5   Tae Woo Ralpho        Lynsey Duron  28   f         2023-01-04
#> 6     Lynsey Duron    Vladimir Sanchez  46   m         2023-01-07
#>   date_last_contact was_case status
#> 1        2023-01-05        Y   case
#> 2        2023-01-02        Y   case
#> 3        2023-01-02        Y   case
#> 4        2023-01-04        Y   case
#> 5        2023-01-05        Y   case
#> 6        2023-01-08        Y   case
epicontacts <- make_epicontacts(
  linelist = outbreak$linelist,
  contacts = outbreak$contacts,
  id = "case_name",
  from = "from",
  to = "to",
  directed = TRUE
)
epicontacts
#> 
#> /// Epidemiological Contacts //
#> 
#>   // class: epicontacts
#>   // 12 cases in linelist; 11 contacts;  directed 
#> 
#>   // linelist
#> 
#> # A tibble: 12 × 12
#>    id               id.1 case_type sex     age date_onset date_admission outcome
#>    <chr>           <int> <chr>     <chr> <int> <date>     <date>         <chr>  
#>  1 Munsif al-Demi…     1 suspected m        35 2023-01-01 NA             recove…
#>  2 Zachary Nault       2 confirmed m        43 2023-01-01 2023-01-08     recove…
#>  3 Tae Woo Ralpho      3 confirmed m         1 2023-01-02 NA             recove…
#>  4 Abdul Kader el…     5 suspected m        78 2023-01-05 2023-01-06     died   
#>  5 Katelyn Catlin      6 confirmed f        22 2023-01-02 2023-01-29     died   
#>  6 Lynsey Duron        8 confirmed f        28 2023-01-04 NA             recove…
#>  7 Vladimir Sanch…    11 confirmed m        46 2023-01-04 NA             recove…
#>  8 Jacy Cousins       12 confirmed f        67 2023-01-06 NA             recove…
#>  9 Anushkaran Ken…    13 confirmed m        86 2023-01-07 NA             recove…
#> 10 Katja Muetz        19 probable  f        56 2023-01-09 NA             recove…
#> 11 Benito Redding     21 suspected m        50 2023-01-11 2023-01-13     died   
#> 12 Erin Payson        22 confirmed f         7 2023-01-11 NA             recove…
#> # ℹ 4 more variables: date_outcome <date>, date_first_contact <date>,
#> #   date_last_contact <date>, ct_value <dbl>
#> 
#>   // contacts
#> 
#> # A tibble: 11 × 8
#>    from   to      age sex   date_first_contact date_last_contact was_case status
#>    <chr>  <chr> <int> <chr> <date>             <date>            <chr>    <chr> 
#>  1 Munsi… Zach…    43 m     2022-12-30         2023-01-05        Y        case  
#>  2 Munsi… Tae …     1 m     2022-12-30         2023-01-02        Y        case  
#>  3 Zacha… Abdu…    78 m     2022-12-29         2023-01-02        Y        case  
#>  4 Tae W… Kate…    22 f     2023-01-02         2023-01-04        Y        case  
#>  5 Tae W… Lyns…    28 f     2023-01-04         2023-01-05        Y        case  
#>  6 Lynse… Vlad…    46 m     2023-01-07         2023-01-08        Y        case  
#>  7 Lynse… Jacy…    67 f     2023-01-04         2023-01-07        Y        case  
#>  8 Lynse… Anus…    86 m     2023-01-03         2023-01-06        Y        case  
#>  9 Anush… Katj…    56 f     2023-01-12         2023-01-15        Y        case  
#> 10 Anush… Beni…    50 m     2023-01-07         2023-01-09        Y        case  
#> 11 Katja… Erin…     7 f     2023-01-08         2023-01-10        Y        case
plot(epicontacts)

Transmission chain from infectious disease outbreak. This includes only individuals that were infected, including confirmed, probable and suspected cases.

Visualising other line list information

If there are other aspects of line list data that can be plotted and you would like to them added to this vignette please make an issue or pull request.