Before starting, it should be mentioned that the sole purpose of this vignette is to provide intuitive and easily replicable instructions on how to use the FVDDPpkg package on . For this reason, the underlying theory will not be developed except in the minimum necessary terms; for more information, please refer to the bibliography cited here.
First of all, import the package.
As shown in the work of Papaspiliopoulos, Ruggiero, and Spanò (2016) Fleming-Viot Dependent Dirichlet Processes (FVDDP), conditioned on observed data Y, take the general form of finite mixtures of Dirichlet Processes: in fact Xt | Y∼∑m∈MwmΠα+∑Kj=1mjδy∗j where
The derivation of this model, which stems from the study of population genetics, is done by exploiting the concept of duality for Markov processes (Papaspiliopoulos and Ruggiero (2014)), applying it to the results of Ethier and Griffiths (1993), among the others, on the seminal work of Fleming and Viot (1979).
In order to understand how to recover the previous expression, start
noting that, unconditional on observed data Xt0∼Πα, where the
time is arbitrarily set t=t0. This
means that, while no data is included within the model, FVDDP can be
fully characterized by θ and
P0. The creation of a process is
carried out using the function initialize
The user has to
specify the positive real number theta
and two functions,
the first to sample from P0
(sampling.f
) and the second to evaluate its p.d.f. or
p.m.f. (density.f
), depending whether it is atomic or not;
this is specified by the last argument, atomic
. The
function returns an object of class fvddp
.
FVDDP = initialize(theta = 1.28, sampling.f = function(x) rpois(x, 5),
density.f = function(x) dpois(x, 5), TRUE)
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#>
#> is.atomic: TRUE
#>
#> Empty Process
In this chunk of code, for example, θ=1.28 and P0∼Po(5). Note that when printing the process, it is explicitly stated that no data has been included within the model.
Updating the process with data collected at time t0 stored in the vector Y0, the form of the latent signal becomes Xt0 | Y0∼Πα+∑Kj=1mjδy∗j. In some sense, this already is the mixture expressed in the general formula, under the specification that the vector y∗ collects the unique values observed at time t0, m stores the multiplicities of Y0 with respect to y∗, and M={m}: this implies that wm=1.
The update is performed by means of the update()
command, whose arguments are an fvddp
object and a numeric
vector. The returned object will include the information provided by
Y0. In particular:
y.star
.M
of size |M|×K, where |M| denotes the cardinality of M and K is the length of y∗. Each vector m is stored as a row of
M
.w
of size |M|. Its
j-th element represents the weight
associated to the j-th row of
M
.FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 4 7 9
#>
#> Multiplicities (M):
#> 4 7 9
#> [1,] 1 2 1
#>
#> Weights (w):
#> [1] 1
As one could expect, updating the signal with the vector (4,7,7,9) (the order of the input does not matter) leads to a degenerate mixture with y∗=(4,7,9) and M={(1,2,1)}.
Updating a non-empty process will have a different effect: suppose to know the law of Xtn | Y0,…,Yn−1∼∑m∈MwmΠα+∑Kj=1mjδy∗j where Yj denotes a vector of values observed at time tj, then Xtn|Y0…,Yn∼∑m∈(M+n)wϕmΠα+∑Kj=1mjδy∗j where n=(n1,…,nK) is the vector of multiplicities of Yn according to the unique values collected up to the same time, the new weights are such that wϕm∝wmPU(n|m) where PU(m|n) denotes the probability of drawing a vector of multiplicities n starting from m via Polya urn sampling scheme under θ and P0 specified by the model, and M+n={m+n:m∈M}.. Hence, the following changes will be applied to the input process of the function:
y.star
.M
.For the details of the role of Polya urn scheme on the update of mixtures of Dirichlet Processes, see Antoniak (1974) and Blackwell and MacQueen (1973).
The propagation of the signal, also known as prediction, aims to estimate the state of the process at a time after the data is collected: in other words, in the future. If updating the signal one can get Xtn | Y0,…,Yn, the use of the propagation leads to Xtn+t | Y0,…,Yn∼∑n∈L(M)wψnΠα+∑Kj=1njδy∗j. This means that the probability mass is shifted to a set L(M)={n∈M:∃ m∈M:n≤m} where the notation n≤m implies that nj≤mj ∀j∈{1,…,K}. The new weights are such that wϕn=∑m∈M:n≤mwmpm,n(t) and pm,n(t) represents the probability of reaching n starting from m in a time t for a K-dimensional death process, as shown in Tavaré (1984); however, the exact value of such probability, stated in Papaspiliopoulos, Ruggiero, and Spanò (2016) is pm,n(t)={e−λ|m|tif n=mC|m|,|n|(t)MVH(n;|n|,m)if n<m where λ|m|=|m|(θ+|m|−1)2 and C|m|,|n|(t)=(|m|∏h=|n|+1λh)(−1)|m|−|n||m|∑k=|n|e−λkt∏|n|≤h≤|m|,h≠k(λk−λh) and |m| represents the L1 norm (i.e. the sum) of the vector m.
The propagate()
function can be exploited to propagate
the signal. Its arguments are an fvddp
object and a
(positive) time. The result is the propagated process, whose matrix
M
will be larger and whose weights will be as described in
the formulae above.
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 4 7 9
#>
#> Multiplicities (M):
#> 4 7 9
#> [1,] 0 0 0
#> [2,] 1 0 0
#> [3,] 0 1 0
#> [4,] 1 1 0
#> [5,] 0 2 0
#> [6,] 1 2 0
#> [7,] 0 0 1
#> [8,] 1 0 1
#> [9,] 0 1 1
#> [10,] 1 1 1
#> [11,] 0 2 1
#> [12,] 1 2 1
#>
#> Weights (w):
#> [1] 0.060274058 0.099035520 0.198071041 0.142898157 0.071449079 0.027252056
#> [7] 0.099035520 0.071449079 0.142898157 0.054504111 0.027252056 0.005881167
The example shows the propagation of the signal introduced in the
previous section, with t=0.6.
Note that y.star
does not vary. Also, note that in examples
like this, it is sufficient a time t≃2 to shift almost all the mass to the component of the mixture
characterized by the zero vector.
FVDDP
#> theta: 1.28
#>
#> P0.sample: function(x) rpois(x, 5)
#>
#> P0.density/mass: function(x) dpois(x, 5)
#> <bytecode: 0x12b7a6180>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 4 5 7 9 10
#>
#> Multiplicities (M):
#> 4 5 7 9 10
#> [1,] 1 1 2 0 2
#> [2,] 2 1 2 0 2
#> [3,] 1 1 3 0 2
#> [4,] 2 1 3 0 2
#> [5,] 1 1 4 0 2
#> [6,] 2 1 4 0 2
#> [7,] 1 1 2 1 2
#> [8,] 2 1 2 1 2
#> [9,] 1 1 3 1 2
#> [10,] 2 1 3 1 2
#> [11,] 1 1 4 1 2
#> [12,] 2 1 4 1 2
#>
#> Weights (w):
#> [1] 0.032822343 0.051700651 0.302672433 0.327846322 0.083102652 0.061084451
#> [7] 0.009482191 0.010270845 0.060128867 0.044197613 0.011203233 0.005488398
The latter chunk shows an application of update()
on a
larger process. A larger vector y.new
may induce large
variations in the weights. Being it of size 3, the example does not cause an
immediately recognizable effect.
In the theory of Hidden Markov Models, the smoothing operator is used to infer the state of the signal given observations from the past, the present and the future. In other words, one can estimate Xt when t≤tn, exploiting all collected data.
To do so, it has been shown by Ascolani, Lijoi, and Ruggiero (2023) that it is required to create two processes. The first has to be propagated forward from t0 to t−i−1 as in the previous sections; the second one has to be run backward using the same strategy: initialize and update it at tn and propagate it towards tn−1 (with a positive time tn−tn−1 in the function), and sequentially update and propagate until ti+1 is reached.
Doing this, one will get that Xti−1 | Y0,…,Yi−1∼∑ni−1∈Mi−1uni−1Πα+∑Kj=1ni−1,jδy∗j and Xti+1 | Yi+1,…,Yn=∑ni+1∈Mi+1vni+1Πα+∑Kj=1ni+1,jδy∗j where the subscript i−1 and i+1 are necessary to specify elements from the past or the future mixture, v stands for the weights and for example ni−1,j is the j-th component of the vector ni−1 (same for ni+1).
Provided this description based on available data from past and future, call ni the multiplicities generated by the vector Yi. Then
Xti | Xti−1,Xti+1,Yi∼∑ni−1∈Mi−1∑ni+1∈Mi+1uni−1vni+1∑(ki−1,ki+1)∈Dni−1,ni+1wni−1,ni+1ki−1,ni,ki+1Πα+∑Kj=1(ki−1,j+ni,j+ki+1,j)δy∗j where:
if P0 is non-atomic, define the sets Di−1:={j∈{1,…,K}:ni−1,j>0 and either ni,j>0 or ni+1,j>0}, Di+1:={j∈{1,…,K}:ni+1,j>0 and either ni,j>0 or ni−1,j>0} and S:=Di−1∪Di+1 to express the indices of shared values among different times. Then Dni−1,ni+1={(ki−1,ki+1):ki−1≤ni−1 and ki−1,j>0 ∀ j∈Di−1,ki+1≤ni+1 and ki+1,j>0 ∀ j∈Di+1} and the weights are such that:
if P0 is atomic, let Dni−1,ni+1:={(ki−1,ki+1):ki−1≤ni−1,ki+1≤ni+1} and the weights can be expressed as wni−1,ni+1ki−1,ni,ki+1 ∝˜pni−1,ni+1ki−1,ki+1m(ki−1+ni+ki+1)m(ki−1)m(ni)m(ki+1)
where ˜pni−1,ni+1ki−1,ki+1=pni−1,ki−1(ti−ti−1)pni+1,ki+1(ti+1−ti) is the joint transition probability from ni−1 to ki−1 in time ti−ti−1 and from ni+1 to ki+1 in time ti+1−ti and m(⋅) is the marginal likelihood function of multiplicities in the atomic case.
This peculiar structure, developed in Ascolani, Lijoi, and Ruggiero (2023), can be
better understood with some examples. They can be shown in this
implementation with smooth()
. The arguments are two latent
signals (fvddp.past
and fvddp.future
), the
positive times ti−ti−1
(t.past
) and ti+1−ti (t.future
) and the data collected at time
ti (y.new
).
FVDDP_NONATOMIC = initialize(theta = 0.7, sampling.f = function(x) rbeta(x, 4, 7),
density.f = function(x) dbeta(x, 4, 7), atomic = FALSE)
FVDDP_PAST_NONATOMIC = update(fvddp = FVDDP_NONATOMIC, y.new = c(0.210, 0.635, .541))
FVDDP_FUTURE_NONATOMIC = update(fvddp = FVDDP_NONATOMIC, y.new = c(0.210))
FVDDP_FUTURE_NONATOMIC = propagate(fvddp = FVDDP_FUTURE_NONATOMIC, delta.t = 0.4)
FVDDP_FUTURE_NONATOMIC = update(fvddp = FVDDP_FUTURE_NONATOMIC, y.new = c(.635))
In the example above, two process were created with θ=0.7 and P0∼Beta(4,7). The signal was updated once in the past, and twice in the future (with a propagation between the two updates).
FVDDP_SMOOTH_NONATOMIC = smooth(fvddp.past = FVDDP_PAST_NONATOMIC, fvddp.future = FVDDP_FUTURE_NONATOMIC,
t.past = 0.75, t.future = 0.3, y.new = c(0.210, 0.635, 0.479))
FVDDP_SMOOTH_NONATOMIC
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 4, 7)
#>
#> P0.density/mass: function(x) dbeta(x, 4, 7)
#>
#> is.atomic: FALSE
#>
#> Unique values (y.star):
#> [1] 0.210 0.479 0.541 0.635
#>
#> Multiplicities (M):
#> 0.21 0.479 0.541 0.635
#> [1,] 2 1 0 3
#> [2,] 2 1 1 3
#> [3,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.23344663 0.03392615 0.73262722
Using the function on the two processes, it is possible to see that the structure described above for the nonatomic case causes a shrinkage of the mixture. Indeed, the set M only contains three vectors.
In order to make a comparison, one can try to do something similar taking P0∼Binom(10,0.6).
FVDDP_ATOMIC = initialize(theta = 0.7, sampling.f = function(x) rbeta(x, 10, 0.6),
density.f = function(x) dbinom(x, 10, 0.6), atomic = TRUE)
FVDDP_PAST_ATOMIC = update(fvddp = FVDDP_ATOMIC, y.new = c(2, 6, 5))
FVDDP_FUTURE_ATOMIC = update(fvddp = FVDDP_ATOMIC, y.new = c(2))
FVDDP_FUTURE_ATOMIC = propagate(fvddp = FVDDP_FUTURE_ATOMIC, delta.t = 0.4)
FVDDP_FUTURE_ATOMIC = update(fvddp = FVDDP_FUTURE_ATOMIC, y.new = c(6))
As before, the mixture referred to past observations is updated once, the one referred to future observations is updated twice.
FVDDP_SMOOTH_ATOMIC = smooth(fvddp.past = FVDDP_PAST_ATOMIC, fvddp.future = FVDDP_FUTURE_ATOMIC,
t.past = 0.75, t.future = 0.3, y.new = c(2, 6, 4))
FVDDP_SMOOTH_ATOMIC
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#>
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 2 4 5 6
#>
#> Multiplicities (M):
#> 2 4 5 6
#> [1,] 1 1 0 1
#> [2,] 1 1 0 2
#> [3,] 1 1 0 3
#> [4,] 1 1 1 1
#> [5,] 1 1 1 2
#> [6,] 1 1 1 3
#> [7,] 1 1 2 1
#> [8,] 1 1 2 2
#> [9,] 1 1 2 3
#> [10,] 2 1 0 1
#> [11,] 2 1 0 2
#> [12,] 2 1 0 3
#> [13,] 2 1 1 1
#> [14,] 2 1 1 2
#> [15,] 2 1 1 3
#> [16,] 2 1 2 1
#> [17,] 2 1 2 2
#> [18,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.0001721823 0.0025036150 0.0094628016 0.0002320301 0.0024384330
#> [6] 0.0068287189 0.0004682311 0.0037328174 0.0075867038 0.0117827759
#> [11] 0.1266737735 0.3103950448 0.0112750878 0.0913256329 0.1717719912
#> [16] 0.0153587293 0.0979424891 0.1300489424
In this case, the mixture is clearly bigger. The reason is that when P0 is atomic, the set Dni−1,ni+1 does not put constraints based on the appearance of shared values across different times.
Past examples should also provide an insight on the main issue
related to propagate()
and smooth()
: the size
of the matrix M grows polynomially
with respect to the amount of collected data; moreover, it can be seen
that as the number of weights increases, many of them become almost
negligible.
In order to avoid long computations, it is possible to use
approx.propagate()
: it reproduces the propagation of the
signal by means of Monte Carlo method. The idea, proposed by Ascolani, Lijoi, and Ruggiero (2021), is to
mimic the evolution of the K-dimensional death process using a
one-dimensional one, and then extracting a multidimensional vector with
a multivariate hypergeometic distributions.
FVDDP =initialize(theta = 3, sampling.f= function(x) rnorm(x, -1, 3),
density.f = function(x) dnorm(x, -1, 3), atomic = FALSE)
FVDDP = update(fvddp = FVDDP, y.new = c(-1.145, 0.553, 0.553, 0.553))
In the previous chunk, a process with hyperparameters θ=3 and P0∼N(−1,3) was created
and updated. The syntax of the approximating functions is just the same
as in propagate()
, with the exceptions that one must
specify the number of samples N
to be drawn.
FVDDP_APPR_PROP
#> theta: 3
#>
#> P0.sample: function(x) rnorm(x, -1, 3)
#>
#> P0.density/mass: function(x) dnorm(x, -1, 3)
#>
#> is.atomic: FALSE
#>
#> Unique values (y.star):
#> [1] -1.145 0.553
#>
#> Multiplicities (M):
#> -1.145 0.553
#> [1,] 0 0
#> [2,] 0 1
#> [3,] 0 2
#> [4,] 0 3
#> [5,] 1 0
#> [6,] 1 1
#> [7,] 1 2
#> [8,] 1 3
#>
#> Weights (w):
#> [1] 0.13530 0.33260 0.17440 0.02000 0.10310 0.17210 0.05785 0.00465
The results is again an fvddp
object. In order to
measure the accuracy of such approximation, one has to compute the exact
output of the propagation, again with time t=0.45.
Then one can measure the difference in the weights with
error.estimate()
. The arguments are
fvddp.exact
and fvddp.approx
, and the output
is a vector containing the difference among the weights, in absolute
value. The option remove.unmatched
allows to choose
whenever a vector is in the exact propagation but not in the
approximate: if TRUE
, the missing weight is assumed to be
0, if FALSE
, this
comparison will not be reported in the output (which will result to be
shorter).
error.estimate(FVDDP_EXACT_PROP, FVDDP_APPR_PROP)
#> [1] 0.0058286503 0.0068326918 0.0028019247 0.0014386014 0.0008613986
#> [6] 0.0015530747 0.0001989751 0.0001334191
Something similar can be done for the smoothing via
approximate.smooth()
; in this case the Monte Carlo method
is necessary to support importance sampling, where the importances are
given by the right hand side of the formulae for wni−1,ni+1ki−1,ni,ki+1. For
this reason, the result of the simulation may be less stable than in the
case of the propagation seen above, and a larger amount of samples will
be required to achieve a good accuracy.
In the following example, one can see how to copy wht was done in the exact smoothing.
FVDDP_SMOOTH_APPR = approx.smooth(fvddp.past = FVDDP_PAST_ATOMIC, fvddp.future = FVDDP_FUTURE_ATOMIC,
t.past = 0.75, t.future = 0.3, y.new = c(2, 6, 4), N = 50000)
FVDDP_SMOOTH_APPR
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#>
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 2 4 5 6
#>
#> Multiplicities (M):
#> 2 4 5 6
#> [1,] 1 1 0 1
#> [2,] 1 1 0 2
#> [3,] 1 1 0 3
#> [4,] 1 1 1 1
#> [5,] 1 1 1 2
#> [6,] 1 1 1 3
#> [7,] 1 1 2 1
#> [8,] 1 1 2 2
#> [9,] 1 1 2 3
#> [10,] 2 1 0 1
#> [11,] 2 1 0 2
#> [12,] 2 1 0 3
#> [13,] 2 1 1 1
#> [14,] 2 1 1 2
#> [15,] 2 1 1 3
#> [16,] 2 1 2 1
#> [17,] 2 1 2 2
#> [18,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.0002318445 0.0032076427 0.0117206721 0.0002089701 0.0021740870
#> [6] 0.0057754325 0.0003174570 0.0025240938 0.0051318324 0.0161197332
#> [11] 0.1568137178 0.3993952863 0.0088537112 0.0776609913 0.1427394035
#> [16] 0.0102407155 0.0676997460 0.0891846631
error.estimate(FVDDP_SMOOTH_ATOMIC, FVDDP_SMOOTH_APPR)
#> [1] 5.966216e-05 7.040276e-04 2.257871e-03 2.305999e-05 2.643460e-04
#> [6] 1.053286e-03 1.507740e-04 1.208724e-03 2.454871e-03 4.336957e-03
#> [11] 3.013994e-02 8.900024e-02 2.421377e-03 1.366464e-02 2.903259e-02
#> [16] 5.118014e-03 3.024274e-02 4.086428e-02
Another tool to cut the computational cost of predictive of smoothing
inference is given by the prune()
function. It allows to
remove from the mixture all vectors m whose weight wm is under some treshold ε. Such eights are then
normalized such that their sum is 1.
In the example, the function will be applied to one of the processes prevously calculated, fixing ε=10−2.
PRUNED
#> theta: 0.7
#>
#> P0.sample: function(x) rbeta(x, 10, 0.6)
#>
#> P0.density/mass: function(x) dbinom(x, 10, 0.6)
#> <bytecode: 0x129a106d8>
#>
#> is.atomic: TRUE
#>
#> Unique values (y.star):
#> [1] 2 4 5 6
#>
#> Multiplicities (M):
#> 2 4 5 6
#> [1,] 2 1 0 1
#> [2,] 2 1 0 2
#> [3,] 2 1 0 3
#> [4,] 2 1 1 1
#> [5,] 2 1 1 2
#> [6,] 2 1 1 3
#> [7,] 2 1 2 1
#> [8,] 2 1 2 2
#> [9,] 2 1 2 3
#>
#> Weights (w):
#> [1] 0.01219024 0.13105433 0.32112895 0.01166500 0.09448380 0.17771211 0.01588986
#> [8] 0.10132948 0.13454622
In this context, the treshold is insanely high; this is done for the unique purpose of showing how the function works; in the practical use of the package, a reasonable ε is between 10−9 and the machine epsilon of the computer in use.
The last task that it can be performed is sampling values from Fleming-Viot dependent Dirichlet Processes. This can be done by simply choosing a vector wm and drawing a value from P0 with probability θθ+|m|, or choosing y∗m with probability mjθ+|m|. To get a sample of size N, it is sufficient to replicate this mechanism n times.
The implementation is named posterior.sample()
. Its
arguments are the signal and the number N
of values to
draw.
table(round(y, 3))
#>
#> -8.67 -7.232 -7.197 -6.912 -6.764 -5.968 -5.659 -5.249 -5.19 -5.114 -5.105
#> 1 1 1 1 1 1 1 1 1 1 1
#> -4.826 -4.375 -4.326 -4.21 -4.073 -3.836 -3.728 -3.562 -3.428 -2.793 -2.746
#> 1 1 1 1 1 1 1 1 1 1 1
#> -2.304 -2.162 -2.04 -1.687 -1.624 -1.555 -1.447 -1.286 -1.21 -1.145 -1.07
#> 1 1 1 1 1 1 1 1 1 3 1
#> -0.965 -0.706 -0.656 -0.561 -0.465 -0.462 -0.298 -0.256 -0.134 -0.059 -0.032
#> 1 1 1 1 1 1 1 1 1 1 1
#> 0.128 0.208 0.369 0.384 0.444 0.553 0.652 0.817 0.95 1.014 1.036
#> 1 1 1 1 1 26 1 1 1 1 1
#> 1.092 1.367 1.442 1.535 1.603 2.016 2.233 2.29 2.341 2.447 2.766
#> 1 1 1 1 1 1 1 1 1 1 1
#> 2.959 3.393 3.893 4.828 4.854 5.744 5.833
#> 1 1 1 1 1 1 1
The command table()
was used here to display more
efficiently how many times each value has been sampled.
In the Bayesian Nonparametric framework, scientists prefer to use the predictive structure of the Dirichlet process when they want to picture how future observations will be like. This choice is strongly related to the exchangeability assumption underlying the model (more in Ghosal and Vaart (2017)); in this context, however, it is sufficient to say that predictive structure is nothing but the sequential use of posterior sampling and update. In fact, a value is repeatedly drawn from the mixture and it is incorporated within each vector m∈M via an update. A full description of this mechanism was developed by Ascolani, Lijoi, and Ruggiero (2021).
This is implemented efficiently via predictive.struct()
;
the arguments are the same as in posterior.sample()
.