scoringutils
now depends on R 3.6. The change was made since packages testthat
and lifecycle
, which are used in scoringutils
now require R 3.6. We also updated the Github action CI check to work with R 3.6 now.set_forecast_unit()
where the function only workded with a data.table, but not a data.frame as an input.scoringutils
had duplicated entries. This was fixed by removing the duplicated rows.This major release contains a range of new features and bug fixes that have been introduced in minor releases since 1.1.0
. The most important changes are:
set_forecast_unit()
function allows manual setting of forecast unit.summarise_scores()
gains new across
argument for summarizing across variables.transform_forecasts()
and log_shift()
functions allow forecast transformations. See the documentation for transform_forecasts()
for more details and an example use case.get_prediction_type()
for integer matrix input.interval_score()
for small interval ranges.Thanks to @nikosbosse, @seabbs, and @sbfnk for code and review contributions. Thanks to @bisaloo for the suggestion to use a linting GitHub Action that only triggers on changes, and @adrian-lison for the suggestion to add a warning to interval_score()
if the interval range is between 0 and 1.
scoringutils 1.1.0
. In particular, usage of the functions set_forecast_unit()
, check_forecasts()
and transform_forecasts()
are now documented in the Vignettes. The introduction of these functions enhances the overall workflow and help to make the code more readable. All functions are designed to be used together with the pipe operator. For example, one can now use something like the following:example_quantile |>
set_forecast_unit(c("model", "location", "forecast_date", "horizon", "target_type")) |>
check_forecasts() |>
score()
Documentation for the transform_forecasts()
has also been extended. This functions allows the user to easily add transformations of forecasts, as suggested in the paper “Scoring epidemiological forecasts on transformed scales”. In an epidemiological context, for example, it may make sense to apply the natural logarithm first before scoring forecasts, in order to obtain scores that reflect how well models are able to predict exponential growth rates, rather than absolute values. Users can now do something like the following to score a transformed version of the data in addition to the original one:
data <- example_quantile[true_value > 0, ]
data |>
transform_forecasts(fun = log_shift, offset = 1) |>
score() |>
summarise_scores(by = c("model", "scale"))
Here we use the log_shift()
function to apply a logarithmic transformation to the forecasts. This function was introduced in scoringutils 1.1.2
as a helper function that acts just like log()
, but has an additional argument offset
that can add a number to every prediction and observed value before applying the log transformation.
check_forecasts()
and score()
pipeable (see issue #290). This means that users can now directly use the output of check_forecasts()
as input for score()
. As score()
otherwise runs check_forecasts()
internally anyway this simply makes the step explicit and helps writing clearer code.Release by @seabbs in #305. Reviewed by @nikosbosse and @sbfnk.
prediction_type
argument of get_forecast_unit()
has been changed dropped. Instead a new internal function prediction_is_quantile()
is used to detect if a quantile variable is present. Whilst this is an internal function it may impact some users as it is accessible via `find_duplicates().bias_range()
and bias_quantile()
more obvious to the user as this may cause unexpected behaviour.bias_range()
so that it uses bias_quantile()
internally.bias_range()
, bias_quantile()
, and check_predictions()
to make sure that the input is valid.bias_range()
, bias_quantile()
, and bias_sample()
.get_prediction_type()
which led to it being unable to correctly detect integer (instead categorising them as continuous) forecasts when the input was a matrix. This issue impacted bias_sample()
and also score()
when used with integer forecasts resulting in lower bias scores than expected.across
, to summarise_scores()
. This argument allows the user to summarise scores across different forecast units as an alternative to specifying by
. See the documentation for summarise_scores()
for more details and an example use case.set_forecast_unit()
that allows the user to set the forecast unit manually. The function removes all columns that are not relevant for uniquely identifying a single forecast. If not done manually, scoringutils
attempts to determine the unit of a single automatically by simply assuming that all column names are relevant to determine the forecast unit. This can lead to unexpected behaviour, so setting the forecast unit explicitly can help make the code easier to debug and easier to read (see issue #268). When used as part of a workflow, set_forecast_unit()
can be directly piped into check_forecasts()
to check everything is in order.interval_score()
if the interval range is between 0 and 1. Thanks to @adrian-lison (see #277) for the suggestion.epinowcast
package.epinowcast
package.transform_forecasts()
to make it easy to transform forecasts before scoring them, as suggested in Bosse et al. (2023), https://www.medrxiv.org/content/10.1101/2023.01.23.23284722v1.log_shift()
that implements the default transformation function. The function allows to add an offset before applying the logarithm.interval_score()
which explicitly converts the logical vector to a numeric one. This should happen implicitly anyway, but is now done explicitly in order to avoid issues that may come up if the input vector has a type that doesn’t allow the implicit conversion.A minor update to the package with some bug fixes and minor changes.
1.0.0
.metric
argument of summarise_scores()
to relative_skill_metric
. This argument is now deprecated and will be removed in a future version of the package. Please use the new argument instead.score()
and related functions to make the soft requirement for a model
column in the input data more explicit.score()
, pairwise_comparison()
and summarise_scores()
to make it clearer what the unit of a single forecast is that is required for computationsplot_pairwise_comparison()
which now only supports plotting mean score ratios or p-values and removed the hybrid option to print both at the same time.pairwise_comparison()
now trigger an explicit and informative error message.sample
column when using a quantile forecast format. Previously this resulted in an error.Major update to the package and most package functions with lots of breaking changes.
eval_forecasts()
was replaced by a function score()
with a much reduced set of function arguments.summarise_scores()
check_forecasts()
to analyse input data before scoringcorrelation()
to compute correlations between different metricsadd_coverage()
to add coverage for specific central prediction intervals.avail_forecasts()
allows to visualise the number of available forecasts.find_duplicates()
to find duplicate forecasts which cause an error.plot_
. Arguments were simplified.pit()
now works based on data.frames. The old pit
function was renamed to pit_sample()
. PIT p-values were removed entirely.plot_pit()
now works directly with input as produced by pit()
score()
were restricted to sample-based, quantile-based or binary forecasts.brier_score()
now returns all brier scores, rather than taking the mean before returning an output.crps()
, dss()
and logs()
were renamed to crps_sample()
, dss_sample()
, and logs_sample()
sample_to_quantile()
function (https://github.com/epiforecasts/scoringutils/pull/223)example_
.summary_metrics
was included that contains a summary of the metrics implemented in scoringutils
.check_forecasts()
that runs some basic checks on the input data and provides feedback.table[]
rather than as table
, such that they don’t have to be called twice to display the contents.pairwise_comparison()
that runs pairwise comparisons between models on the output of eval_forecasts()
eval_forecasts()
.eval_forecasts()
can now handle a separate forecast and truth data set as as input.eval_forecasts()
now supports scoring point forecasts along side quantiles in a quantile-based format. Currently the only metric used is the absolute error.eval_forecasts()
got a major rewrite. While functionality should be unchanged, the code should now be easier to maintaincount_median_twice = FALSE
.score
.correlation_plot()
shows correlation between metrics.plot_ranges()
shows contribution of different prediction intervals to some chosen metric.plot_heatmap()
visualises scores as heatmap.plot_score_table()
shows a coloured summary table of scores.score
now has a slightly changed meaning. It now denotes the lowest possible grouping unit, i.e. the unit of one observation and needs to be specified explicitly. The default is now NULL
. The reason for this change is that most metrics need scoring on the observation level and this the most consistent implementation of this principle. The pit function receives its grouping now from summarise_by
. In a similar spirit, summarise_by
has to be specified explicitly and e.g. doesn’t assume anymore that you want ‘range’ to be included.weigh = TRUE
is now the default option.score
. Bias as well as calibration now take all quantiles into account.summarise_by
argument in score()
The summary can return the mean, the standard deviation as well as an arbitrary set of quantiles.score()
can now return pit histograms.ggplot2
for plotting.Interval_score
is now interval_score
, CRPS
is now crps
etc.score()
.score()
score()
: bias, sharpness and calibrationscore()
.score()
.README
.