The joineRML
package implements methods for analyzing
data from multiple longitudinal studies in which the responses
from each subject consists of time-sequences of repeated measurements
and a possibly censored time-to-event outcome. The modelling framework
for the repeated measurements is the multivariate linear mixed effects
model. The model for the time-to-event outcome is a Cox proportional
hazards model with log-Gaussian frailty. Stochastic dependence is
captured by allowing the Gaussian random effects of the linear model to
be correlated with the frailty term of the Cox proportional hazards
model. For full details of the model, please consult the technical
vignette by running
vignette("technical", package = "joineRML")
The simplest way to explain the concepts of the package is through an
example. joineRML
comes with the data set
heart.valve
. Details of this data can be found in the help
file by running the command
help("heart.valve", package = "joineRML")
This data is in so-called long or unbalanced format:
library("joineRML")
data("heart.valve")
head(heart.valve)
## num sex age time fuyrs status grad log.grad lvmi log.lvmi ef
## 1 1 0 75.06027 0.0109589 4.956164 0 10 2.302585 118.98 4.778955 93
## 2 1 0 75.06027 3.6794520 4.956164 0 10 2.302585 118.98 4.778955 93
## 3 1 0 75.06027 4.6958900 4.956164 0 10 2.302585 137.63 4.924569 93
## 4 2 0 45.79452 6.3643840 9.663014 0 14 2.639057 114.93 4.744323 68
## 5 2 0 45.79452 7.3041100 9.663014 0 9 2.197225 109.80 4.698661 70
## 6 2 0 45.79452 8.3013700 9.663014 0 12 2.484907 157.40 5.058790 56
## bsa lvh prenyha redo size con.cabg creat dm acei lv emergenc hc sten.reg.mix
## 1 1.77 1 3 0 27 1 103 0 1 1 0 0 1
## 2 1.77 1 3 0 27 1 103 0 1 1 0 0 1
## 3 1.77 1 3 0 27 1 103 0 1 1 0 0 1
## 4 1.92 1 1 1 22 0 76 0 0 2 0 0 1
## 5 1.92 1 1 1 22 0 76 0 0 2 0 0 1
## 6 1.92 1 1 1 22 0 76 0 0 2 0 0 1
## hs
## 1 Stentless valve
## 2 Stentless valve
## 3 Stentless valve
## 4 Homograft
## 5 Homograft
## 6 Homograft
The data refer to 256 patients and are stored in the unbalanced
format, which is convenient here because measurement times were unique
to each subject. The data are stored as a single R object,
heart.valve
, which is a data frame of dimension 988 by 25.
The average number of repeated measurements per subject is therefore
988/256 = 3.86. As with any unbalanced data set, values of time-constant
variables are repeated over all rows that refer to the same subject. The
dimensionality of the data set can be confirmed by a call to the
dim()
function, whilst the names of the 25 variables can be
listed by a call to the names()
function:
dim(heart.valve)
## [1] 988 25
names(heart.valve)
## [1] "num" "sex" "age" "time" "fuyrs"
## [6] "status" "grad" "log.grad" "lvmi" "log.lvmi"
## [11] "ef" "bsa" "lvh" "prenyha" "redo"
## [16] "size" "con.cabg" "creat" "dm" "acei"
## [21] "lv" "emergenc" "hc" "sten.reg.mix" "hs"
We will only analyse a subset of this data, namely records with
case-complete data for heart valve gradient (grad
) and left
ventricular mass index (lvmi
):
<- heart.valve[!is.na(heart.valve$grad) & !is.na(heart.valve$lvmi), ] hvd
Strictly speaking, this is not necessary because
joineRML
can handle the situation of different measurement
schedules within subjects That is, a subject does not need to
have all multiple longitudinal outcomes recorded at each visit. It is
conceivable that some biomarkers will be measured more or less
frequently than others. For example, invasive measurements may only be
recorded annually, whereas a simple biomarker measurement might be
recorded more frequently. joineRML
can handle this
situation by specifying each longitudinal outcome its own data
frame.
The main function in the joineRML
package is the
mjoint()
function. Its main (required) arguments are:
formLongFixed
: a list (of length equal to the number
of longitudinal outcome types considered) of two-sided formulae
specifying the response on the left-hand side and the mean linear
predictor terms for the fixed effects in the linear mixed models on the
right-hand side.
formLongRandom
: a list (of same length as
formLongFixed
) of one-sided formulae specifying the model
for random effects in the linear mixed models.
formSurv
: a formula specifying the proportional
hazards regression model for the time-to-event outcome in the same
structure as for survival::coxph
.
data
: a list (of same length as
formLongFixed
) of data.frames; one for each longitudinal
outcome. It is assumed that the event time data is in the first
data.frame (i.e. data[[1]]
), unless the argument
survData
(which defaults to NULL
) is
specified. If K>1 and the data
are balanced within patients (i.e. multiple markers measured at common
measurement times), then one can specify data
as a data
frame rather than as a list.
timeVar
: the column name indicating the time
variable in the linear mixed effects model. If K>1 and the data frames have different
column names for time, then timeVar
can alternatively be
specified as a vector of strings of length K.
We can fit a bivariate joint model to the log-transformed valve
gradient and LVMI indices in the hvd
subset using
set.seed(12345)
<- mjoint(
fit formLongFixed = list("grad" = log.grad ~ time + sex + hs,
"lvmi" = log.lvmi ~ time + sex),
formLongRandom = list("grad" = ~ 1 | num,
"lvmi" = ~ time | num),
formSurv = Surv(fuyrs, status) ~ age,
data = list(hvd, hvd),
inits = list("gamma" = c(0.11, 1.51, 0.80)),
timeVar = "time")
## Running multivariate LMM EM algorithm to establish initial parameters...
## Finished multivariate LMM EM algorithm...
## EM algorithm has converged!
## Calculating post model fit statistics...
Details on the model estimation algorithm are provided in the
technical details vignette. We note here that this is not necessarily
the most appropriate model for the data, and is included only for the
purposes of demonstration. There are a number of other useful arguments
in the mjoint
function; for example, inits
for
specifying (partial) initial values, control
for
controlling the optimization algorithm, and verbose
for
monitoring the convergence output in real-time. A full list of all
arguments with explanation are given in the help documentation, accessed
by running help("mjoint")
.
Once we have a fitted mjoint
object, we can begin to
extract relevant information from it. Most summary statistics are
available from the summary
function:
summary(fit)
##
## Call:
## mjoint(formLongFixed = list(grad = log.grad ~ time + sex + hs,
## lvmi = log.lvmi ~ time + sex), formLongRandom = list(grad = ~1 |
## num, lvmi = ~time | num), formSurv = Surv(fuyrs, status) ~
## age, data = list(hvd, hvd), timeVar = "time", inits = list(gamma = c(0.11,
## 1.51, 0.8)))
##
## Data Descriptives:
##
## Event Process
## Number of subjects: 221
## Number of events: 47 (21.3%)
##
## Longitudinal Process
## Number of longitudinal outcomes: K = 2
## Number of observations:
## Outcome 1 (grad): n = 629
## Outcome 2 (lvmi): n = 629
##
## Joint Model Summary:
##
## Longitudinal Process: Multivariate linear mixed-effects model
## log.grad ~ time + sex + hs, random = ~1 | num
## log.lvmi ~ time + sex, random = ~time | num
## Event Process: Cox proportional hazards model
## Surv(fuyrs, status) ~ age
## Model fit statistics:
## log.Lik AIC BIC
## -991.0896 2018.179 2079.346
##
## Variance Components:
##
## Random effects variance covariance matrix
## (Intercept)_1 (Intercept)_2 time_2
## (Intercept)_1 0.106120 0.0195780 0.0035670
## (Intercept)_2 0.019578 0.1189300 -0.0063197
## time_2 0.003567 -0.0063197 0.0024783
## Standard Deviations: 0.32576 0.34486 0.049782
##
## Residual standard errors(s):
## sigma2_1 sigma2_2
## 0.5965479 0.1926074
##
## Coefficient Estimates:
##
## Longitudinal sub-model:
## Value Std.Err z-value p-value
## (Intercept)_1 2.5079 0.0641 39.1002 <0.0001
## time_1 -0.0042 0.0134 -0.3158 0.7521
## sex_1 0.1437 0.0769 1.8690 0.0616
## hsStentless valve_1 0.1878 0.0744 2.5248 0.0116
## (Intercept)_2 5.0896 0.0339 150.2610 <0.0001
## time_2 -0.0098 0.0068 -1.4322 0.1521
## sex_2 -0.1993 0.0547 -3.6414 0.0003
##
## Time-to-event sub-model:
## Value Std.Err z-value p-value
## age 0.1089 0.0160 6.7941 <0.0001
## gamma_1 1.5306 0.8926 1.7148 0.0864
## gamma_2 0.7928 0.6476 1.2242 0.2209
##
## Algorithm Summary:
## Total computational time: 36 secs
## EM algorithm computational time: 34 secs
## Convergence status: converged
## Convergence criterion: sas
## Final Monte Carlo sample size: 1232
## Standard errors calculated using method: approx
One can also extract the coefficients, fixed effects, and random effects using standard generic functions:
coef(fit)
## $D
## (Intercept)_1 (Intercept)_2 time_2
## (Intercept)_1 0.106120934 0.019578491 0.003567006
## (Intercept)_2 0.019578491 0.118926770 -0.006319693
## time_2 0.003567006 -0.006319693 0.002478283
##
## $beta
## (Intercept)_1 time_1 sex_1 hsStentless valve_1
## 2.507930393 -0.004218942 0.143744599 0.187840507
## (Intercept)_2 time_2 sex_2
## 5.089627175 -0.009779160 -0.199304559
##
## $sigma2
## sigma2_1 sigma2_2
## 0.3558694 0.0370976
##
## $haz
## [1] 0.001856804 0.001864430 0.002008645 0.002218714 0.002254153 0.002330517
## [7] 0.002387402 0.002390799 0.002428295 0.002444427 0.002458578 0.002666427
## [13] 0.002721670 0.002780504 0.002801212 0.002836222 0.002886282 0.003021364
## [19] 0.003152726 0.003358884 0.003641268 0.007625702 0.004240236 0.004459208
## [25] 0.004565031 0.004760109 0.005171155 0.005300564 0.006222395 0.006760472
## [31] 0.006929515 0.007143369 0.007255821 0.007852467 0.008521236 0.008876758
## [37] 0.009240505 0.010037252 0.011921242 0.012298386 0.013639916 0.016489795
## [43] 0.017010196 0.046441755 0.048568393 0.247872014
##
## $gamma
## age gamma_1 gamma_2
## 0.1089274 1.5306406 0.7927571
fixef(fit, process = "Longitudinal")
## (Intercept)_1 time_1 sex_1 hsStentless valve_1
## 2.507930393 -0.004218942 0.143744599 0.187840507
## (Intercept)_2 time_2 sex_2
## 5.089627175 -0.009779160 -0.199304559
fixef(fit, process = "Event")
## age gamma_1 gamma_2
## 0.1089274 1.5306406 0.7927571
head(ranef(fit))
## (Intercept)_1 (Intercept)_2 time_2
## 1 -0.20388173 -0.245507072 0.010585671
## 2 -0.04431516 -0.177032824 0.001841862
## 3 -0.01694762 0.006085043 0.010704819
## 4 -0.42828770 -0.597216526 0.012950479
## 5 -0.05225624 0.077231080 0.021770283
## 6 0.23799697 0.227354140 0.005746461
Although a model fit may indicate convergence, it is generally a good
idea to examine the convergence plots. These can be viewed using the
plot
function for each group of model parameters.
plot(fit, params = "gamma")
plot(fit, params = "beta")
Once an mjoint
model has converged, and assuming the
pfs
argument is TRUE
(default), then
approximated standard errors are calculated based on the empirical
information matrix of the profile likelihood at the maximizer.
Theoretically, these standard errors will be underestimated (see the
technical vignette). In principle, residual Monte Carlo error will
oppose this through an increase in uncertainty.
<- bootSE(fit, nboot = 100) fit.se
Bootstrapping is a computationally intensive method, possibly taking many hours to fit. For this reason, one can relax the control parameter constraints on the optimization algorithm for each bootstrap model; however, this will be at the possible expense of inflated standard errors due to Monte Carlo error.
We can call the bootSE
object to interrogate it
fit.se
or alternatively re-run the summary
command, passing the
additional argument of bootSE = fit.se
summary(fit, bootSE = fit.se)
joineRML
versus
joineR
There are a growing number of software options for fitting joint
models of a single longitudinal outcome and a single time-to-event
outcome; what we call here univariate joint models.
joineR
(version 1.2.7) is one package available in R for
fitting such models, however joineRML
can fit these models
too, since the univariate model is simply a special case of the
multivariate model. It is useful to contrast these two packages. There
are theoretical and practical implementation differences between the
packages beyond just univariate versus multivariate capability:
joineR
uses Gauss-Hermite quadrature (with 3 nodes)
for numerical integration, whereas joineRML
uses Monte
Carlo integration with an automated selection of sample size.
joineR
only allows for random-intercept models,
random-intercept and random-slope models, or a quadratic model.
joineRML
, on the other hand, allows for any random effects
structure.
joineR
only allows for specification of convergence
based on an absolute difference criterion.
joineR
does not calculate approximate standard
errors, and instead requires a bootstrap approach be used after the
model fit.
The current version of joineR
requires a data
pre-processing step in order to generate a joint.data
object, whereas joineRML
can work straight from the data
frame.
To fit a univariate model in joineR
we run the following
code for the hvd
data
library(joineR, quietly = TRUE)
<- UniqueVariables(hvd, var.col = c("fuyrs", "status"), id.col = "num")
hvd.surv <- UniqueVariables(hvd, "age", id.col = "num")
hvd.cov <- hvd[, c("num", "time", "log.lvmi")]
hvd.long
<- jointdata(longitudinal = hvd.long,
hvd.jd baseline = hvd.cov,
survival = hvd.surv,
id.col = "num",
time.col = "time")
<- joint(data = hvd.jd,
fit.joiner long.formula = log.lvmi ~ time + age,
surv.formula = Surv(fuyrs, status) ~ age,
model = "intslope")
summary(fit.joiner)
##
## Call:
## joint(data = hvd.jd, long.formula = log.lvmi ~ time + age, surv.formula = Surv(fuyrs,
## status) ~ age, model = "intslope")
##
## Random effects joint model
## Data: hvd.jd
## Log-likelihood: -373.2656
##
## Longitudinal sub-model fixed effects: log.lvmi ~ time + age
## (Intercept) 4.9979864721
## time -0.0096970070
## age 0.0004407409
##
## Survival sub-model fixed effects: Surv(fuyrs, status) ~ age
## age 0.1058821
##
## Latent association:
## gamma_0 1.072406
##
## Variance components:
## U_0 U_1 Residual
## 0.128470783 0.002561009 0.037055051
##
## Convergence at iteration: 18
##
## Number of observations: 629
## Number of groups: 221
To fit a univariate model in joineRML
we run the
following code for the hvd
data
set.seed(123)
<- mjoint(formLongFixed = log.lvmi ~ time + age,
fit.joinerml formLongRandom = ~ time | num,
formSurv = Surv(fuyrs, status) ~ age,
data = hvd,
timeVar = "time")
## EM algorithm has converged!
## Calculating post model fit statistics...
summary(fit.joinerml)
##
## Call:
## mjoint(formLongFixed = log.lvmi ~ time + age, formLongRandom = ~time |
## num, formSurv = Surv(fuyrs, status) ~ age, data = hvd, timeVar = "time")
##
## Data Descriptives:
##
## Event Process
## Number of subjects: 221
## Number of events: 47 (21.3%)
##
## Longitudinal Process
## Number of longitudinal outcomes: K = 1
## Number of observations:
## Outcome 1: n = 629
##
## Joint Model Summary:
##
## Longitudinal Process: Univariate linear mixed-effects model
## log.lvmi ~ time + age, random = ~time | num
## Event Process: Cox proportional hazards model
## Surv(fuyrs, status) ~ age
## Model fit statistics:
## log.Lik AIC BIC
## -373.2623 764.5247 795.1081
##
## Variance Components:
##
## Random effects variance covariance matrix
## (Intercept)_1 time_1
## (Intercept)_1 0.1282200 -0.0066785
## time_1 -0.0066785 0.0024842
## Standard Deviations: 0.35808 0.049842
##
## Residual standard errors(s):
## sigma2_1
## 0.1926676
##
## Coefficient Estimates:
##
## Longitudinal sub-model:
## Value Std.Err z-value p-value
## (Intercept)_1 4.9994 0.1405 35.5834 <0.0001
## time_1 -0.0094 0.0067 -1.4081 0.1591
## age_1 0.0004 0.0021 0.1984 0.8428
##
## Time-to-event sub-model:
## Value Std.Err z-value p-value
## age 0.1060 0.0149 7.0951 <0.0001
## gamma_1 1.0803 0.5416 1.9949 0.0461
##
## Algorithm Summary:
## Total computational time: 7.7 secs
## EM algorithm computational time: 7.2 secs
## Convergence status: converged
## Convergence criterion: sas
## Final Monte Carlo sample size: 511
## Standard errors calculated using method: approx
In addition to just comparing model parameter estimates, we can also extract the predicted (or posterior) random effects from each model and plot them.
<- as.numeric(row.names(fit.joiner$coefficients$random))
id <- order(id) # joineR rearranges patient ordering during EM fit
id.ord par(mfrow = c(1, 2))
plot(fit.joiner$coefficients$random[id.ord, 1], ranef(fit.joinerml)[, 1],
main = "Predicted random intercepts",
xlab = "joineR", ylab = "joineRML")
grid()
abline(a = 0, b = 1, col = 2, lty = "dashed")
plot(fit.joiner$coefficients$random[id.ord, 2], ranef(fit.joinerml)[, 2],
main = "Predicted random slopes",
xlab = "joineR", ylab = "joineRML")
grid()
abline(a = 0, b = 1, col = 2, lty = "dashed")