Complex Climatologies

Robert W Schlegel

2018-06-22

Overview

One of the long term goals of heatwaveR is the inclusion of many different methods for the creation of climatologies for the use of detecting heatwaves and cold-spells in time series data. To this end we have made a very large change in the event detection pipeline, which is why we moved from the RmarineHeatWaves package to heatwaveR. This change was the inclusion of the ts2clm() function and the removal of climatology generating found in RmarineHeatWaves::detect() in favour of detect_event(), which does not calculate climatologies. In this way we have allowed for the introduction of a multitude of more complex climatology calculation and event detection methods. It is our overarching goal to provide one package that allows climate scientists to calculate these events in both the atmosphere and oceans. But rather than talking about it, let’s walk through a case study on how this shift in the main pipeline of this package can be used for diverse applications.

Brought to our attention by Mr. Haouari from the IHFR institute of meteorology in Algeria was the concept of using a flat 25\(^\circ\)C tMin bottom boundary to screen out events calculated from tMax with the standard 90th percentile upper threshold. As the authors of the heatwaveR package are admittedly marine oriented, we tend to work with daily time series that have only one mean value per day. The use of tMin and tMax is therefore not accommodated explicitly in the arguments that one gives to the ts2clm() and detect_event() functions, but that does not mean that one cannot do so. Below we will work through the steps one would take to calculate (atmospheric) heatwaves, as per the definition for them laid out in Perkins and Alexander (2013), but excluding the calculation of EHF, and with the additional step proposed by Mr. Haouari. In the interest of reproducibility, we will be creating a tMin and tMax time series from the sst_WA data that is installed with heatwaveR. This is not technically correct to do, but will allow us to illustrate the methodology.

The methodology

In the following sub-sections we will walk through the step-by-step approach needed to calculate atmospheric heatwaves using a 90th percentile threshold created from the tMax time series for a location, and then filter the events based on the tMin series also needing to exceed 25\(^\circ\)C on the same days. We will finish by showing how to then convert these results back into a format that event_line() and lolli_plot like so that one may still use these convenient functions to visualise the results.

Data prep

The first step with any analysis in R should be the loading of the packages to be used.

library(tidyverse)
library(heatwaveR)

With our libraries loaded, we will now go about creating artificial tMin and tMax time series from sst_WA. Again, please note that this is not actually something that one should do. We only do so here to illustrate how one would go about doing this. Real tMin and tMax time series should be used when one is executing this methodology for proper research.

# Create tMin time series
tMin <- sst_WA %>% 
  mutate(temp = temp - 1)

# Create tMax time series
tMax <- sst_WA %>% 
  mutate(temp = temp + 1)

Climatology calculation

With our artificial time series created, we will now calculate the two ‘climatologies’ we need to correctly detect and filter the heatwaves. The first is the 90th percentile threshold based on the tMax time series. The second is the exceedance of 25\(^\circ\)C based on the tMin data.

# The tMax threshold
# The WMO standard climatology period 0f 1981-01-01 to 2010-12-31 should be used where possible.
# Unfortunately, the OISST data, from which these data were drawn, only begin in 1982-01-01
tMax_clim <- ts2clm(tMax, climatologyPeriod = c("1982-01-01", "2011-12-31"), pctile = 90)

# The tMin exceedance
# Note the use here of 'minDuration = 3' and 'maxGap = 1' as the default atmospheric arguments
# The deafult marine arguemnts are 'minDuration = 5' and 'maxGap = 2'
tMin_exc <- exceedance(tMin, threshold = 25, minDuration = 3, maxGap = 1)

# Pull out each data.frame as there own object for easier use
tMin_exc_exceedance <- tMin_exc$exceedance
tMin_exc_threshold <- tMin_exc$threshold

Event detection

With our climatologies calculated we may now go about detecting the events in the tMax time series.

# Note the use here of 'minDuration = 3' and 'maxGap = 1' as the default atmospheric arguments
tMax_event <- detect_event(tMax_clim, minDuration = 3, maxGap = 1)

# Pull out each data.frame as there own object for easier use
tMax_event_event <- tMax_event$event
tMax_event_climatology <- tMax_event$climatology

With all of the events detected we may now use the tMin_exc_threshold object to screen out the events in tMax_event_event that had tMin values below our chosen bottom limit of 25\(^\circ\)C.

Filtering events

This is where things may get tricky for some users, and where the default use of the functions in the heatwaveR package ends. We are now going ‘off-road’ so to speak. But do not despair! The tidyverse suite of packages makes data wrangling like this much more user friendly than it was in the dark days of Base R coding. In order to more thoroughly illustrate the following steps we will further break them down into sub-sub-sections.

In order to make the filtering of events easier, we will combine the two different dataframes that we are using as guides to chose the events that meet all of our selection criteria.

# Join the climatology outputs of detect_event() and exceedence()
ts_clims <- left_join(tMax_event_climatology, tMin_exc_threshold, by = c("t"))

# Remove all days that did not qualify for exceddence()
ts_clims_filtered <- ts_clims %>%
  filter(event.y == TRUE)

With our two different filtering indices combined into one dataframe we only need one more ingredient before we can create our final product. We have already decided that we want to screen out events that dipped below a given static bottom threshold. Presumably this is a biologically relevant value that has been determined a priori through some other research. But how many days must the tMin values during the event go below this threshold before it must be excluded from our research? The following chunk of code shows how to calculate the number of days during each event that tMin went below the bottom threshold. What one chooses to do with that information is shown in the following chunk.

# Calculate number of days for each event above the 25C threshold
ts_event_duration_thresh <- ts_clims_filtered %>%
  group_by(event_no) %>%
  summarise(event_duration_thresh = n()) %>%
  na.omit()

Now that we have a third and final filtering index we may extract the events that meet all of the criteria we haven chosen to impose on them.

# Filter out the events that were not above the static bottom threshold for their entire duration
ts_events_filtered <- left_join(tMax_event_event, ts_event_duration_thresh, by = "event_no") %>%
  na.omit() %>%
  filter(event_duration_thresh == duration)
ts_events_filtered
## # A tibble: 0 x 23
## # ... with 23 variables: event_no <int>, index_start <dbl>,
## #   index_peak <int>, index_end <dbl>, duration <int>, date_start <date>,
## #   date_peak <date>, date_end <date>, intensity_mean <dbl>,
## #   intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>,
## #   intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>,
## #   intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>,
## #   intensity_mean_abs <dbl>, intensity_max_abs <dbl>,
## #   intensity_var_abs <dbl>, intensity_cumulative_abs <dbl>,
## #   rate_onset <dbl>, rate_decline <dbl>, event_duration_thresh <int>

Above we see that the result of all of our filtering is that no events occurred within the time series that meet our criteria. We therefore need to loosen up a bit. We may do this by not requiring that the tMin for the events not be above the bottom threshold for their entire duration. T0 do so we will change the way in which we filter for ts_events_filtered. The following code chunk shows how to screen out events that did not exceed the bottom threshold for more than 3 days.

ts_events_filtered <- left_join(tMax_event_event, ts_event_duration_thresh, by = "event_no") %>%
  na.omit() %>%
  filter(event_duration_thresh >= duration - 3)
ts_events_filtered
## # A tibble: 0 x 23
## # ... with 23 variables: event_no <int>, index_start <dbl>,
## #   index_peak <int>, index_end <dbl>, duration <int>, date_start <date>,
## #   date_peak <date>, date_end <date>, intensity_mean <dbl>,
## #   intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>,
## #   intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>,
## #   intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>,
## #   intensity_mean_abs <dbl>, intensity_max_abs <dbl>,
## #   intensity_var_abs <dbl>, intensity_cumulative_abs <dbl>,
## #   rate_onset <dbl>, rate_decline <dbl>, event_duration_thresh <int>

Still zero events. Were we to have a peak at ts_event_duration_thresh we would see that there were only three heatwaves in the entire time series that had tMin values exceeding the static threshold that we set at 25\(^\circ\)C. Furthermore, the majority of the tMin values are below the threshold. So rather than allowing for a set number of days below this threshold, let’s rather ask R to screen out events with only a certain proportion of days below this threshold. Let’s be generous and set this at 25% (i.e. 1/4).

ts_events_filtered <- left_join(tMax_event_event, ts_event_duration_thresh, by = "event_no") %>%
  na.omit() %>%
  filter(event_duration_thresh >= duration / 4)
ts_events_filtered
## # A tibble: 2 x 23
##   event_no index_start index_peak index_end duration date_start date_peak 
##      <int>       <dbl>      <int>     <dbl>    <int> <date>     <date>    
## 1       80        9581       9601      9615       35 2008-03-25 2008-04-14
## 2       92       10585      10651     10689      105 2010-12-24 2011-02-28
## # ... with 16 more variables: date_end <date>, intensity_mean <dbl>,
## #   intensity_max <dbl>, intensity_var <dbl>, intensity_cumulative <dbl>,
## #   intensity_mean_relThresh <dbl>, intensity_max_relThresh <dbl>,
## #   intensity_var_relThresh <dbl>, intensity_cumulative_relThresh <dbl>,
## #   intensity_mean_abs <dbl>, intensity_max_abs <dbl>,
## #   intensity_var_abs <dbl>, intensity_cumulative_abs <dbl>,
## #   rate_onset <dbl>, rate_decline <dbl>, event_duration_thresh <int>

And now we see that two heatwaves emerge from the fold. One moderately long event from 2008, and ol’ faithful in 2010-2011 (Wernberg et al. 2016).

Visuals

We may now have our desired results, but if we want them to work with the built-in visualisation functions that come with heatwaveR we need one more step.

# Create artificial list object similar to detect_event() output
ts_filtered_list <- list(climatology = tMax_event_climatology,
                            event = ts_events_filtered)
# Then run event_line() on it
event_line(ts_filtered_list, start_date = "2010-01-01", end_date = "2012-05-30", spread = 50)

# Or visualise the categories
event_line(ts_filtered_list, start_date = "2010-01-01", end_date = "2012-05-30", 
           spread = 50, category = TRUE)

# Or lolli_plot as desired
lolli_plot(ts_filtered_list, event_count = 1)

One may of course visualise the outputs from the events calculated here with geom_flame() and geom_lolli() as well, but this will not differ from the default method of using these functions as outlined in their help files so we will not go into that here.

Categories

If one then wants to calculate the categories of the events that have met all of the rigours of our complex climatology one will use the same list object created for the visuals above.

ts_category <- category(ts_filtered_list, name = "WA")
ts_category
## # A tibble: 2 x 11
##   event_no event_name peak_date  category   i_max duration p_moderate
##      <int> <fct>      <date>     <chr>      <dbl>    <int>      <dbl>
## 1       80 WA 2008    2008-04-14 III Severe  3.83       35         57
## 2       92 WA 2011    2011-02-28 IV Extreme  6.58      105         52
## # ... with 4 more variables: p_strong <dbl>, p_severe <dbl>,
## #   p_extreme <dbl>, season <chr>

Summary

To be quite honest, I didn’t think it was going to work out to just use SST data in place of atmospheric temperature and just create tMin and tMax time series through static subtraction and addition of values. Marine temperatures exhibit much more temporal auto-correlation than atmospheric data, which is why the default minimum length for marine heatwaves is 5 days, and 3 for atmospheric heatwaves, which allows them to be detected with th atmospheric definition, but it tends not to work at all the other way around. That being said, I think that the results of this vignette are clear enough to serve as a guideline for how to implement this methodology with proper atmospheric tMin and tMax data. Indeed, I have run real atmospheric data through this methodology myself and so do know that it works.

That concludes this vignette. I hope this will be useful both technically and theoretically. The authors of heatwaveR are very happy to receive any further input on the development of the package as well as other potential methods for calculating heatwaves and cold-spells. We see that the methodology outlined above is very useful and we are currently thinking about how best to incorporate these techniques ‘natively’ into the event detection pipeline. Until this has been made available in a later version, we hope that this will suffice.

References

Perkins, Sarah E., and Lisa V. Alexander. 2013. “On the measurement of heat waves.” Journal of Climate 26 (13): 4500–4517. doi:10.1175/JCLI-D-12-00383.1.

Wernberg, Thomas, Scott Bennett, Russell C Babcock, Thibaut De Bettignies, Katherine Cure, Martial Depczynski, Francois Dufois, et al. 2016. “Climate driven regime shift of a temperate marine ecosystem.” Science 149 (1996): 2009–12. doi:10.1126/science.aad8745.