This vignette demonstrates use of a simple routine to do simulations and testing using Fleming-Harrington weighted logrank tests and the MaxCombo test. In addition, we demonstrate how to perform these tests with a dataset not generated by simulation routines within the package. Note that all \(p\)-values computed here are one-sided with small values indicating that the experimental treatment is favored.

The MaxCombo test has been posed as the maximum of multiple Fleming-Harrington weighted logrank tests (Harrington and Fleming (1982), Fleming and Harrington (2011)). Combination tests looking at a maximum of selected tests in this class have also been proposed; see Lee (2007), Roychoudhury et al. (2021), and Lin et al. (2020). The Fleming-Harrington class is indexed by the parameters \(\rho \geq 0\) and \(\gamma \geq 0\). We will denote these as FH(\(\rho, \gamma\)). This class includes the logrank test as FH(0, 0). Other tests of interest here include:

- FH(0, 1): a test that down-weights early events
- FH(1, 0): a test that down-weights late events
- FH(1, 1): a test that down-weights events increasingly as their quantiles differ from the median

`sim_fixed_n()`

We begin with a single trial simulation generated by the routine
`sim_fixed_n()`

using default arguments for that routine.
`sim_fixed_n()`

produces one record per test and data cutoff
method per simulation. Here we choose 3 tests (logrank = FH(0, 0), FH(0,
1) and FH(1, 1)). When more than one test is chosen the correlation
between tests is computed as shown by Karrison
(2016), in this case in the columns `V1`

,
`V2`

, `V3`

. The columns `rho`

,
`gamma`

indicate \(\rho\)
and \(\gamma\) used to compute the
test. `z`

is the FH(\(\rho,
\gamma\)) normal test statistic with variance 1 with a negative
value favoring experimental treatment. The variable `cut`

indicates how the data were cut for analysis, in this case at the
maximum of the targeted minimum follow-up after last enrollment and the
date at which the targeted event count was reached. `Sim`

is
a sequential index of the simulations performed.

```
set.seed(123)
x <- sim_fixed_n(
n_sim = 1,
timing_type = 5,
rho_gamma = data.frame(rho = c(0, 0, 1), gamma = c(0, 1, 1))
)
#> Backend uses sequential processing.
#> Loading required package: foreach
#> Loading required package: future
#>
#> Attaching package: 'future'
#> The following object is masked from 'package:survival':
#>
#> cluster
x |>
gt() |>
fmt_number(columns = c("ln_hr", "z", "duration", "v1", "v2", "v3"), decimals = 2)
```

method | parameter | estimation | se | z | p_value | v1 | v2 | v3 | event | ln_hr | cut | duration | sim |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|

MaxCombo | FH(0, 0) + FH(0, 1) + FH(1, 1) | - | - | −4.06 | 3.70532e-07 | 1.00 | 0.85 | 0.93 | 350 | −0.44 | Max(min follow-up, event cut) | 77.17 | 1 |

MaxCombo | FH(0, 0) + FH(0, 1) + FH(1, 1) | - | - | −4.04 | 3.70532e-07 | 0.85 | 1.00 | 0.94 | 350 | −0.44 | Max(min follow-up, event cut) | 77.17 | 1 |

MaxCombo | FH(0, 0) + FH(0, 1) + FH(1, 1) | - | - | −4.95 | 3.70532e-07 | 0.93 | 0.94 | 1.00 | 350 | −0.44 | Max(min follow-up, event cut) | 77.17 | 1 |

`sim_pw_surv()`

We begin with another simulation generated by
`sim_pw_surv()`

. Again, we use defaults for that routine.

```
set.seed(123)
s <- sim_pw_surv(n = 100)
s |>
head() |>
gt() |>
fmt_number(columns = c("enroll_time", "fail_time", "dropout_time", "cte"), decimals = 2)
```

stratum | enroll_time | treatment | fail_time | dropout_time | cte | fail |
---|---|---|---|---|---|---|

All | 0.02 | experimental | 23.29 | 1,287.17 | 23.32 | 1 |

All | 0.14 | control | 6.96 | 306.66 | 7.10 | 1 |

All | 0.25 | control | 16.96 | 1,761.75 | 17.21 | 1 |

All | 0.28 | experimental | 3.32 | 1,650.14 | 3.60 | 1 |

All | 0.46 | control | 19.08 | 787.98 | 19.53 | 1 |

All | 0.46 | experimental | 39.67 | 50.64 | 40.13 | 1 |

Once generated, we need to cut the data for analysis. Here we cut after 75 events.

tte | event | stratum | treatment |
---|---|---|---|

23.29 | 1 | All | experimental |

6.96 | 1 | All | control |

16.96 | 1 | All | control |

3.32 | 1 | All | experimental |

19.08 | 1 | All | control |

33.29 | 0 | All | experimental |

Now we can analyze this data. We begin with `s`

to show
how this can be done in a single line. In this case, we use the 4 test
combination suggested in Lin et al.
(2020), Roychoudhury et al.
(2021).

```
z <- s |>
cut_data_by_event(75) |>
maxcombo(rho = c(0, 0, 1, 1), gamma = c(0, 1, 0, 1))
z
#> $method
#> [1] "MaxCombo"
#>
#> $parameter
#> [1] "FH(0, 0) + FH(0, 1) + FH(1, 0) + FH(1, 1)"
#>
#> $z
#> [1] -2.511925 -2.907093 -1.899871 -3.119549
#>
#> $p_value
#> [1] 0.00204688
```

Suppose we want the \(p\)-value just
based on the logrank and FH(0, 1) and FH(1, 0) as suggested by Lee (2007). We remove the rows and columns
associated with FH(0, 0) and FH(1, 1) and then apply
`pvalue_maxcombo()`

.

For a trial not generated by `sim_fixed_n()`

, the process
is slightly more involved. We consider survival data not in the simtrial
format and show the transformation needed. In this case we use the small
`aml`

dataset from the survival package.

time | status | x |
---|---|---|

9 | 1 | Maintained |

13 | 1 | Maintained |

13 | 0 | Maintained |

18 | 1 | Maintained |

23 | 1 | Maintained |

28 | 0 | Maintained |

We rename variables and create a stratum variable as follows:

```
x <- aml |> transmute(
tte = time,
event = status,
stratum = "All",
treatment = case_when(
x == "Maintained" ~ "experimental",
x == "Nonmaintained" ~ "control"
)
)
x |>
head() |>
gt()
```

tte | event | stratum | treatment |
---|---|---|---|

9 | 1 | All | experimental |

13 | 1 | All | experimental |

13 | 0 | All | experimental |

18 | 1 | All | experimental |

23 | 1 | All | experimental |

28 | 0 | All | experimental |

Now we analyze the data with a MaxCombo with the logrank and FH(0, 1) and compute a \(p\)-value.

We now consider the example simulation from the
`pvalue_maxcombo()`

help file to demonstrate how to simulate
power for the MaxCombo test. However, we increase the number of
simulations to 100 in this case; a larger number should be used (e.g.,
1000) for a better estimate of design properties. Here we will test at
the \(\alpha=0.001\) level.

```
set.seed(123)
# Only use cut events + min follow-up
x <- sim_fixed_n(
n_sim = 100,
timing_type = 5,
rho_gamma = data.frame(rho = c(0, 0, 1), gamma = c(0, 1, 1))
)
# MaxCombo power estimate for cutoff at max of targeted events, minimum follow-up
x |>
group_by(sim) |>
filter(row_number() == 1) |>
ungroup() |>
summarize(power = mean(p_value < .001))
#> # A tibble: 1 × 1
#> power
#> <dbl>
#> 1 0.79
```

We note the use of `group_map`

in the above produces a
list of \(p\)-values for each
simulation. It would be nice to have something that worked more like
`dplyr::summarize()`

to avoid `unlist()`

and to
allow evaluating, say, multiple data cutoff methods. The latter can be
done without having to re-run all simulations as follows, demonstrated
with a smaller number of simulations.

```
# Only use cuts for events and events + min follow-up
set.seed(123)
x <- sim_fixed_n(
n_sim = 100,
timing_type = c(2, 5),
rho_gamma = data.frame(rho = 0, gamma = c(0, 1))
)
```

Now we compute a \(p\)-value separately for each cut type, first for targeted event count.

```
# Subset to targeted events cutoff tests
# This chunk will be updated after the development of sim_gs_n and sim_fixed_n
x |>
filter(cut == "Targeted events") |>
group_by(sim) |>
filter(row_number() == 1) |>
ungroup() |>
summarize(power = mean(p_value < .025))
#> # A tibble: 1 × 1
#> power
#> <dbl>
#> 1 0.95
```

Now we use the later of targeted events and minimum follow-up cutoffs.

Fleming, Thomas R, and David P Harrington. 2011. *Counting Processes
and Survival Analysis*. Vol. 169. John Wiley & Sons.

Harrington, David P, and Thomas R Fleming. 1982. “A Class of Rank
Test Procedures for Censored Survival Data.” *Biometrika*
69 (3): 553–66.

Karrison, Theodore G. 2016. “Versatile Tests for Comparing
Survival Curves Based on Weighted Log-Rank Statistics.” *The
Stata Journal* 16 (3): 678–90.

Lee, Seung-Hwan. 2007. “On the Versatility of the Combination of
the Weighted Log-Rank Statistics.” *Computational Statistics
& Data Analysis* 51 (12): 6557–64.

Lin, Ray S, Ji Lin, Satrajit Roychoudhury, Keaven M Anderson, Tianle Hu,
Bo Huang, Larry F Leon, et al. 2020. “Alternative Analysis Methods
for Time to Event Endpoints Under Nonproportional Hazards: A Comparative
Analysis.” *Statistics in Biopharmaceutical Research* 12
(2): 187–98.

Roychoudhury, Satrajit, Keaven M Anderson, Jiabu Ye, and Pralay
Mukhopadhyay. 2021. “Robust Design and Analysis of Clinical Trials
with Nonproportional Hazards: A Straw Man Guidance from a Cross-Pharma
Working Group.” *Statistics in Biopharmaceutical
Research*, 1–15.