The BEBOP phase II trial methodology incorporates predictive baseline information to study co-primary efficacy and toxicity outcomes. BEBOP stands for Bayesian Evaluation of Bivariate Binary Outcomes and Predictive Information. It was developed for the PePS2 trial, investigating pembrolizumab in non-small-cell lung cancer (NSCLC) patients with performance status 2 (PS2). A major factor in the PePS2 trial is the data on PS0 / 1 NSCLC patients published by Garon et al. (2015), showing that patients with greater PD-L1 tumour proportion scores are more likely to achieve an objective response. They introduce this predictive biomarker and validate a three-level categorisation Low, Medium and High PD-L1 score. There was also a suggestion (albeit not shown to be statistically significant) that previously untreated patients are more likely to have a response. These same variables will likely be predictive of response in the PS2 population. Brock et al. (publication in submission) developed BEBOP to make efficient use of this information.
BEBOP re-uses the probability model at the core of the EffTox design (Thall and Cook 2004; Thall et al. 2014). Let x represent a vector of predictive baseline variables and \boldsymbol{\theta} a vector of parameters. The marginal probabilities of efficacy and toxicity are estimated using general functions \text{logit } \pi_E(x, \boldsymbol{\theta}) and \text{logit } \pi_T(x, \boldsymbol{\theta}) to be specified by the user.
In PePS2, we have x_{i1} = 1 if a patient has been previously treated. To convey PD-L1 expression, the authors use x_{i2} = 1 and x_{i3} = 0 if a patient has a Low PD-L1 score; x_{i2} = 0 and x_{i3} = 1 if a patient has a Medium PD-L1 score; and x_{i2} = 0 and x_{i3} = 0 if a patient has a High PD-L1 score. Thus, we have x_i = (x_{i1}, x_{i2}, x_{i3}). The marginal efficacy and toxicity functions in PePS2 take the form
\text{logit } \pi_E(x, \boldsymbol{\theta}) = \alpha + \beta x_{i1} + \gamma x_{i2} + \zeta x_{i3}
\text{logit } \pi_T(x, \boldsymbol{\theta}) = \lambda
As with EffTox, let (Y_j, Z_j) be random variables each taking values \{0, 1\} respresenting the presence of efficacy and toxicity in patient j. The efficacy and toxicity events are associated by the joint probability function
Pr(Y = a, Z = b) = \pi_{a,b}(\pi_E, \pi_T) = (\pi_E)^a (1-\pi_E)^{1-a} (\pi_T)^b (1-\pi_T)^{1-b} + (-1)^{a+b} (\pi_E) (1-\pi_E) (\pi_T) (1-\pi_T) \frac{e^\psi-1}{e^\psi+1}.
The complete vector of parameters is \boldsymbol{\theta} = (\alpha, \beta, \gamma, \zeta, \lambda, \psi). Normal priors are specified for the elements of \boldsymbol{\theta}.
The treatment is acceptable for patients with predictive vector x if
\text{Pr}\left\{ \pi_E(x, \boldsymbol{\theta}) > \underline{\pi}_E | \mathcal{D} \right\} > p_E
and
\text{Pr}\left\{ \pi_T(x, \boldsymbol{\theta}) < \overline{\pi}_T | \mathcal{D} \right\} > p_T
where \underline{\pi}_E, \overline{\pi}_T, p_E, p_T are chosen for clinical relevance.
PePS2 is an all-comers trial, thus patients are admitted regardless of their PD-L1 or pre-treatment status. This is motivated by the dearth of treatment options for PS2 NSCLC patients who cannot use chemotherapy. The design allows the predictive information to effectively stratify the analysis without stratifying recruitment. The statistical design uses the common Bayesian tool of borrowing strength across groups to improve the performance of the analysis.
trialr
The cohorts in the PePS2 trial are
i | Pretreated | PDL1 | x1 | x2 | x3 |
---|---|---|---|---|---|
1 | FALSE | Low | 0 | 1 | 0 |
2 | FALSE | Medium | 0 | 0 | 1 |
3 | FALSE | High | 0 | 0 | 0 |
4 | TRUE | Low | 1 | 1 | 0 |
5 | TRUE | Medium | 1 | 0 | 1 |
6 | TRUE | High | 1 | 0 | 0 |
The trial uses a sample size of 60. Let us simulate a set of outcomes with the following efficacy and toxicity rates
library(trialr)
<- function() peps2_get_data(num_patients = 60,
peps2_sc prob_eff = c(0.167, 0.192, 0.5, 0.091, 0.156, 0.439),
prob_tox = rep(0.1, 6),
eff_tox_or = rep(1, 6))
set.seed(123)
<- peps2_sc() dat
In this example, we have used efficacy rates that increase in PD-L1 and are slightly higher in previously-uintreated patients. We use the uniform toxicity rate of 10% across all cohorts, and no association between efficacy and toxicity events, represented by odds-ratios equal to 1.
The dat
object contains, for example, the prior
parameters
c(dat$alpha_mean, dat$alpha_sd)
## [1] -2.2 2.0
and simulated predictive variables and efficacy and toxicity outcomes
::kable(
knitrhead(with(dat, data.frame(eff, tox, x1, x2, x3)), 10)
)
eff | tox | x1 | x2 | x3 |
---|---|---|---|---|
0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 0 | 0 | 1 | 0 |
0 | 1 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 1 |
1 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 1 |
0 | 0 | 0 | 0 | 1 |
We fit the data to the BEBOP model and obtain samples from the
posterior distribution using rstan
. The
BebopInPeps2
model is provided by trialr
and
compiled when the package is installed.
<- stan_peps2(dat$eff, dat$tox, dat$cohorts) fit
It is informative to view plots of posterior beliefs. Posterior samples of parameter values are available but they are less meaningful to us than the modelled efficacy rates, for example. View posterior distributions using code like
::plot(fit, pars = 'prob_eff') rstan
We see that the modelled rates of efficacy are highest in cohorts 3 and 6, but likely to be greater than the critical threshold of 10% in cohort 2 and perhaps cohort 5 as well. In contrast, there is not much evidence of clinical benefit in cohorts 1 and 4. This is confirmed by invoking the formally described analysis and associated decision rules.
<- peps2_process(fit)
decision ::kable(
knitrwith(decision, data.frame(ProbEff, ProbAccEff, ProbTox, ProbAccTox, Accept)),
digits = 3
)
ProbEff | ProbAccEff | ProbTox | ProbAccTox | Accept |
---|---|---|---|---|
0.073 | 0.254 | 0.068 | 1 | FALSE |
0.249 | 0.979 | 0.068 | 1 | TRUE |
0.424 | 0.992 | 0.068 | 1 | TRUE |
0.040 | 0.093 | 0.068 | 1 | FALSE |
0.150 | 0.742 | 0.068 | 1 | TRUE |
0.282 | 0.948 | 0.068 | 1 | TRUE |
ProbAccEff is the posterior probability that the efficacy rate in a cohort is greater than the 10% threshold. ProbAccTox is the probability that toxicity is less than 30%. The treatment is acceptable in a cohort if it is sufficiently efficacious and non-toxic. We see that in this simulated iteration, the treatment would be approved in all cohorts except 1 and 4.
We can perform the simulations on a greater number of iterations to learn about the operating characteristics of the design.
set.seed(123)
<- function(num_sims = 10, sample_data_func = peps2_sc,
run_sims summarise_func = peps2_process, ...) {
<- list()
sims for(i in 1:num_sims) {
print(i)
<- sample_data_func()
dat <- stan_peps2(dat$eff, dat$tox, dat$cohorts, ...)
fit <- summarise_func(fit)
sim <- sim
sims[[i]]
}return(sims)
}
<- run_sims(num_sims = 10, sample_data_func = peps2_sc,
sims summarise_func = peps2_process)
In run_sims
, the second and third args are delegates to
simulate trial outcomes and post-process the rstan
sample
respectively. The outcome samping delelgate is called without arguments.
The post-process delegate is called with first argument the object
returned by the outcome sampling delegate (e.g. dat
above),
and second argument the posterior sample from rstan
(e.g. samp
above). The objects returned by the post-process
delegate form the items in the sims
object that are
returned to the user by peps2_run_sims
.
Thus, the probaility of approving the treatment using the statistical design in this scenario can be calculated using
apply(sapply(sims, function(x) x$Accept), 1, mean)
trialr
is available at https://github.com/brockk/trialr and https://CRAN.R-project.org/package=trialr