How I can predict events on a new dataset?

Load packages

library(recforest)
library(dplyr)

Prepare data

We use the built-in dataset bladder1_recforest for this example. We build two subsamples of initial data for training and testing the model.

data("bladder1_recforest")

id_individuals_bladder1_recforest <- unique(bladder1_recforest$id)

train_ids <- sample(id_individuals_bladder1_recforest, size = 100, replace = FALSE)
test_ids <- setdiff(id_individuals_bladder1_recforest, train_ids)

train_bladder1_recforest <- bladder1_recforest %>%
  filter(id %in% train_ids)

test_bladder1_recforest <- bladder1_recforest %>%
  filter(id %in% test_ids)

Train a recforest model

Hyperparameters are user-fixed (to be optimized in real-world settings). Considering the small number of predictors, mtry was set to 2. For further details on hyperparameters, call ?train_forest.

set.seed(1234)
trained_forest <- train_forest(
  data = train_bladder1_recforest,
  id_var = "id",
  covariates = c("treatment", "number", "size"),
  time_vars = c("t.start", "t.stop"),
  death_var = "death",
  event = "event",
  n_trees = 3,
  n_bootstrap = round(2 * length(train_ids) / 3),
  mtry = 2,
  minsplit = 3,
  nodesize = 15,
  method = "NAa",
  min_score = 5,
  max_nodes = 20,
  seed = 111,
  parallel = FALSE,
  verbose = FALSE
)

Predict on new data

Predictions from recforest model are the expected mean cumulative number of recurrent events for each individual at the end of follow-up. Evaluations on new data based on the 3 metrics (C-index for recurrent events, Integrated MSE for recurrent events and Integrated Score for recurrent events) will be available soon.

predictions <- predict(
  trained_forest,
  newdata = test_bladder1_recforest,
  id_var = "id",
  covariates = c("treatment", "number", "size"),
  time_vars = c("t.start", "t.stop"),
  death_var = "death"
)